1) UCD-Generator is an application that uses natural language processing to automatically generate use case diagrams from plain English requirements.
2) It analyzes the text using LESSA, which performs tokenization, part-of-speech tagging, and meaning understanding to extract actors, actions, and objects.
3) It then generates use case diagrams in two phases - first extracting information, then drawing the diagrams based on that information. Experiments showed it could accurately generate diagrams for 85% of scenarios.
Feature Based Image Classification by using Principal Component Analysis
UCD Generator (ICIET 2007)
1. IEEE- International Conference on Information and Emerging Technologies-
IEE-ICIET,
Karachi, Pakistan
UCD-Generator – A LESSA Application for Use Case Design
Imran S. Bajwa1, Irfan Hyder 2
1 CISUC – Departament of Informatics Engineering, University of Coimbra,
3030 Coimbra, Portugal
2- PAF-KIET
Karachi, Pakistan.
imran@dei.uc.pt, hyder@pafkiet.edu.pk
Abstract with the end user or another system to achieve a specific
business goal. Use cases are a tool for capturing
In object-oriented design, use cases are one of the functional requirements related to a designed system
important but complex components to design. UCD- applications. The functional requirements principally
Generator is a LESSA based application that makes this represent the intended behavior of a system. This
process more simple and easy. UCD-Generator is an complementary behavior may be expressed as services,
automated system that has vigorous ability to directly tasks or functions provided by the specific system [2].
interact wit the user. This research paper presents a The patterns of modeling and mapping of business logic
natural language processing based approach LESSA in software engineering have been completely changed
that is used for automatically understanding the natural in the recent times. These days all major phases of
language text and extract required information. This software engineering follow the rules defined by the
information is used to draw the Use Case diagrams. Object Oriented design paradigm. All phases of software
User writes his interface based preferences in simple engineering are deviating from the conventions and new
English in a few paragraphs and the designed system paradigms are more popular these days [3]. Same the
has conspicuous ability to analyze the given script. After case is with Software analysis process which uses
compound analysis and extraction of associated Unified Modeling Language to map and model the user
information, the designed system actually draws the Use requirements. Analysis is the key process of building
Case class diagrams. Other conventional CASE tools modern information system applications and base for the
require a complete understanding of the intended robust and vigorous software application’s design and
business scenario and lot of extra time and efforts from development.
the system analyst during the process of creating,
arranging, labeling and finishing the USE Case Use case design is the major part of UML
diagrams. The designed system provides a quick and documentation and is also a difficult phase of object
reliable way to generate Use Case diagrams to save the oriented design of software applications as use cases are
time and budget of both the user and system analyst. typically generated according to the requirements and
preferences of a particular user. A system analyst has to
spend a lot of time in correspondence with the user and
Keywords: Information Extraction, Use Case Design, understand his specific preferences. The other problem
Natural Language Processing, Automatic Diagrams specifically addressed in this research is software
Generation, Text Understanding, UML Diagrams. analysis and design using available CASE tools which
are just paint like tools as Visual UML, GD Pro, Smart
1. Introduction Draw, Rational Rose etc [5]. To use their extensively
overloaded interface of these CASE tools is a vexing
UML (Unified Modeling Language) provide several problem. The process of generating the UML diagrams
types of diagrams that are used to increase the simplicity through these software engineering tools is very
and conception of an application under development. difficult, time consuming and lengthy process to
Use case diagrams are one of the UML diagrams. A use perform. Consequently, an automated software
case diagram is used to exemplify the supported development was required that may acquire the
functionalities the system. The software development preference directly from user and generate
teams visualize the functional requirements of a system corresponding USE Cases with improved output in
by use case diagrams. A use case diagram describes the minimum time consumed.
relationship of the actors to main processes in the system
and also the relationships among different use cases. An In this paper, a natural language processing approach
actor is a human being that interacts with the designed is presented for semantic analysis of the text and its
system [1]. Each use case provides one or more conversion of text into the use case diagrams. The
scenarios that convey how the system should interact module that implements UCD-Generator is called
2. IEEE- International Conference on Information and Emerging Technologies-
IEE-ICIET,
Karachi, Pakistan
LESSA (Language Engineering System for Semantic adjectives and various other complements. It is little
Analysis). LESSA is an automated rule-based Approach complex and multipart procedure.
that reads the natural language text, understands its
A procedure is defined that understands these
meanings and extracts the required information. LESSA
sentences and discover the actor, finds interaction, goals
performs lexical, syntax and semantic analysis of the
and intended objects as following:
natural language text.
Actor: The actor causes the action to occur as in "user
In the next section the, the basics of LESSA have been
fills the form" user is actor who performs the task
briefly described. In the next section, the processing
of form filling.
steps of the UCD-Generator system have been
illustrated. In the next section, some experiments have Action: This component defines the ultimate goal of a
been illustrated and in the last section the used approach certain use case. For example, user fills form to
is compared with other used approaches for text get registered. Here registered is goal of the
understanding and furthermore the conclusion of the intended Use Case.
research has been illustrated.
Object: The thematic object is the object the sentence is
really all about— typically the object, undergoing
2. Proposed Approach a change. Often the thematic object is the same as
the syntactic direct object, as "User presses the
In order to address the intended problem of this
key." Here the key is thematic object.
research a useful and supportive system was required,
which has sound ability to facilitate and assist both the 2.2. LESSA
users and software engineers. The functionality of the
conducted research was domain specific but it can be LESSA (Language Engineering Systems for Semantic
enhanced easily in the future according to the Analysis) is a natural language processing approach.
requirements. Current designed system incorporate the That is primarily used to understand and analyze the
capability of understanding the mapping user natural language text. LESSA is based on a rule-based
preferences after reading the given business scenario in algorithm. This rule-based algorithm has a number of
plain text and then ultimately drawing the Use Case rules that are defined according to the English grammar.
diagrams. An Integrated Development Environment has LESSA bases on a rule based algorithm that is used for
also been provided for User Interaction and efficient understanding and analyzing the natural languages text.
Input and output. LESSA has following major modules:
2.1. Automated Object-Oriented Design 2.2.1. Tokenization
From various projected phases of software application This module reads the natural language text and
engineering, analysis and design of an information converts the text into lexical tokens. These lexical tokens
system actually relates to understand the system and its are processing elements of LESSA. These lexical tokens
primary components [2]. Typically, design relates to are further passed to the next phase for further
manage and control the complexity parameter in a processing.
domain. In software engineering, design methods
provide various notation usually graphical ones. These 2.2.2. POS Tagging
notations allow to store and communicate the perpetual
design decisions. Object-oriented design has overruled This module receives the lexical tokens from the
the typical analysis and design techniques as structured previous module and applies a rule-based algorithm on
design and data-driven design [5]. As compared to old the text. The major function of this module is to identify
style design paradigms, object-oriented design models and extract the various parts of speech in the given text.
the every active entity of the problem domain using This phase is also called parts of speech tagging [13].
concept of objects. Objects have:
2.2.3. Meaning Understanding
– Object State (shape and condition)
This last module understands the semantics of the
– Object Behaviour (What they perform) given test and on the basis of extracted semantics, the
The major emphasis of the conducted research was object, subject and adverb parts of the sentences are
automatic identification of objects from a problem identified. Following is an example to describe the
domain. User provides the input text in English language working of LESSA module.
related to the business domain. After the lexical analysis "User fills the form."
of the text, syntax analysis is performed on word level to
recognize the word category [6]. First of all the available Typically, LESSA reads this text and performs the
lexicons are categorized into nouns, pronouns, lexical analysis to find the parts of speech and then
prepositions, adverbs, articles, conjunctions, etc. The further analyze to find the actor, action and the object in
syntactic analysis of the programs would have to be in a the given course of text. For this example, following is
position to isolate subject, verbs, objects, adverbs, the output.
3. IEEE- International Conference on Information and Emerging Technologies-
IEE-ICIET,
Karachi, Pakistan
Lexicons Phase-I Phase –II Step 3 – In this step, particular actions of an actor
with the system are defined. These are the actions that
User Noun Actor are required by an actor and these actions in a system are
fills Verb Action categorized as Use Cases.
the Article -------
form Noun Object Step 4 - For each of these Use Cases decide on the
most usual course when that Actor is using the system.
This is the final output of syntax assessment phase and This is the basic course or description of the Use Case.
all subject nouns are marked as objects and verbs are
marked as actions and all object nouns s are marked as
the objects in the scenario. In the above example, there is Fills the form
one actor ‘User’ in the example. Besides the actor, there
is also an action ‘fills’ and finally there is an object
“form”, on which certain action has been performed. User
Figure 3.2 – Generating a complete Use Case
3. UCD-Generator
The designed system UCD-Generator (Use case 3.3. UCD-Generator Architecture
Diagram Generator) uses natural language processing
techniques to generate the Use Case diagrams. The The complete architecture of UCD-Generator has been
process of generation of the use case diagram is a multi- divided into three distinct parts. First part The natural
phase process. There are two main phases as following language processing engine is module which performs
lexical and syntax analysis and extracts information like
a. Information Extraction who is actor, what is an object and what is the related
b. Diagram Generation description of the particular use case. The details are
manifested below:
3.1. Information Extraction
In the information extraction phase the text input
given by the user is analyzed to extract the parts of
speech. This phase is also called parts of speech tagging
in which pronouns, verbs, adjectives, articles, etc are
identified in a sentence. LESSA performs this action that
typically uses a rule-based algorithm that has been used
to perform syntax analysis. After Syntax analysis, the
sentence level categorization of a sentence (string) as
actor, object is performed. In the last the categorized
string of a sentence is represented in to the respective
Use Case diagrams
3.2. Diagram Generation
In this phase, first the extracted information from the
previous phase is analyzed and then according to this
information, the use case diagrams are generated. In this
phase, small diagrams are combined to generate the
complete use case diagram. The designed system follows
following four steps to generate a use case diagram.
Step 1 – In step I, the actors are identified. Actors
are those people who use the various functions of the
intended system e.g. “User hits the key”. Here the actor
is represented by the user
Step 2 – One Actor is picked at a time. Actor is
named: actor is named “user” in this example.
Figure 3.3 - Describe the Basic Course for each Use Case
The designed system has an insightful ability to
User
understand the natural language text and draw the
Figure 3.1 – Actor in a Use Case intended use case diagrams. Current designed system
4. IEEE- International Conference on Information and Emerging Technologies-
IEE-ICIET,
Karachi, Pakistan
incorporate the capability of understanding the mapping
user preferences after reading the given business
scenario in plain text and then ultimately drawing the
Use Case diagrams. An Integrated Development
Environment has also been provided for User Interaction
and efficient Input and output.
4. Experiments and Analysis
An application UCD-Generator was designed to test
the results of the proposed approach. This application
has two parts. First part is the input acquisition part in
which an input window is available. Input text can be
provided here for further processing as shown in the
Figure 4.2 – Use Case Information Dialog Box
figure 4.1.
The use case information is displayed in the ‘Use Case
View’ window. This window has two abs. The ‘Use
Cases’ tab shows the generated use cases. Other tab ‘Use
Case Info’ shows the extracted information from the
input text. The information available is the use case
actor, use case action and use case object. After
extracting this information, a use case diagram can be
generated successfully.
Figure 4.3 shows the final output window which
displays the generated use case in ‘Use Case’ window.
The generated use case is based on the information
displayed in the ‘Use Case Info.’ window. An additional
facility is provided for the user of the system, that he can
modify the extracted information for accurate use case
diagram generation. User can select the accurate
information from ‘UseCase Info’ tab to improve the
accuracy.
Figure 4.1 – Use Case Input Dialog Box
The input dialog box of UCD-Generator has two tabs.
First tab is ‘UseCase Input’ and second tab is ‘Samples’.
In first tab common tips to generate the system use cases
are provided. For better accuracy, user is intended to
write only one or two use cases at a time. User is also
expected tow write grammatically correct English as this
will finally enhance the system accuracy. The complete
information about any use case should be provided:
actor, action and action description. In ‘UseCase Input’
window ‘Scenario Help’ is also provided for assisting
the user in providing precise input. ‘Sample’ tab has
some sample examples of input and their generated
output by the UCD-Generator.
Figure 4.3 – Use Case final Output Dialog Box
In this example the input text was “First, the user fills
the form. Then he submits the form further processing.” User can improve the diagrams accuracy by repeating
In this input there are two use cases of single this process with technically and grammatically
functionality. When the user clicks the ‘Draw Diagram’ improved input text. About 50 different real-time
button, the text is handed over to LESSA for semantic scenarios were provided to the system to check its
analysis. LESSA module first reads the input text and accuracy. 85% of the scenarios were accurately
converts the text into lexical tokens. These tokens are translated into use case diagram. Accuracy if further
further processed to find the parts of speech. After parts- improves if the extracted information is filtered before
of-Speech tagging, the text is further analyzed to extract generating the diagrams.
its meanings. The extracted information is shown in the
figure 4.2.
5. IEEE- International Conference on Information and Emerging Technologies-
IEE-ICIET,
Karachi, Pakistan
5. CONCLUSION American Society for Information Science, 39(1), 1988,
pp. 8–16.
This conducted research is all about the automatic
generation of the Use Case diagrams after reading and [11] S. Kok and P. Domingos, (2005) “Learning the structure
of Markov logic networks’, In Proc. of ICML-05, Bonn,
analyzing the given scenario in English language Germany, 2005. ACM Pres, pp 441–448
provided by the user. The designed system can find out
the classes and objects and their attributes and operations [12] S. Kok, P. Singla, M. Richardson, and P. Domingos,
using an artificial intelligence technique such as natural (2005) “The Alchemy system for statistical relational
language processing. Then the Use Case diagrams are AI”, Technical report, Department of Computer Science
drawn. The accuracy of the software is up to about 85% and Engineering, http://www.-cs.washington.edu/ai/
with the involvement of the software engineer provided alchemy.
that input is properly formatted. The given scenario [13] Imran Sarwar Bajwa, M. Abbas Choudhary, (2006) “A
should be complete and written in simple and correct Rule Based System for Speech Language Context
English. Under the scope of our project, software will Understanding” Journal of Donghua University, (English
perform a complete analysis of the scenario to find the Edition) 23 (6), pp. 39-42.
classes, their attributes and operations. A well-designed [14] Burns, J., & Madey, G. (2001), “A framework for
graphical user interface has also been provided to the effective user interface design for web-based electronic
user for entering the Input scenario in a proper way and commerce applications”, Informing Science, 4 (2), 67-75
generating Use Case diagrams to ultimately make them
the part of the object oriented documentation. [15] Woo, H., & Robinson, W. (2002). “A light-weight
approach to the reuse of use-cases specifications”, In
Proceedings of the 5th annual conference of the Southern
REFERENCES Association for Information Systems (pp. 330- 336)
Savannah: SAIS
[1] Anda, B., Sjøberg, D., and Jørgensen, M. “Quality and
Understandability in Use Case Models” In Proc. 13th [16] Malaisé Véronique, Zweigen. Pierre, Bachimont Bruno,
European Conference on Object-Oriented Programming (2005) “Mining Defining Contexts to Help Structuring
(ECOOP'2001), Jørgen Lindskov Knudsen (editor), June Differential Ontologies”, Terminology, 11:1
18-22 2001, Budapest, Hungary, LNCS 2072, Springer
Verlag, pp. 402-428.
[17] Imran Sarwar Bajwa, Riaz-Ul-Amin, M. Asif Naeem,
Muhammad Nawaz (2006), “Web Information Mining
[2] Ruth Malan and Dana Bredemeyer, (2001) “Functional Framework using XML Based Knowledge
Requirements and Use Cases”, Bredemeyer Consulting, Representation Engine, Proceedings of 2nd International
ruth_malan@bredemeyer.com Conference on Software Engineering, 2006, Lahore,
Pakistan
[3] Imran Sarwar Bajwa, M. Abbas Choudhary (2006), “A
Language Engineering System for Automatic [18] Condamines, Anne and Rebeyrolle, Josette. (2001).
Conversion of English Cyber Data into Urdu Websites”, “Searching for and identifying conceptual relationships
Proceedings of International Conference on Computer via a corpus based approach to a Terminological
Applications 2006, Yangoon, Myanmar Knowledge Base (CTKB): Method and Results”, Recent
Advances in Computational Terminology, pp. 127-148
[4] Imran Sarwar Bajwa, M. Abbas Choudhary (2006),
“Speech Language context understanding using a Rule [19] Imran Sarwar Bajwa, M. Abbas Choudhary (2006), “A
Based Framework”, International Conference on Language Engineering System for Graphical User
Intelligent Systems and Knowledge Engineering, Interface Design (LESGUID): A Rule based Approach”,
Shanghai, China Proceedings of IEEE- 2nd International Conferences on
Information & Communication Technologies,
[5] Chomsky, N. (1965) “Aspects of the Theory of Syntax. Damascus, Syria
MIT Press, Cambridge, Mass, 1965.
[20] L. R. Tang and R. J. Mooney, (2001), “Using Multiple
[6] Chow, C., & Liu, C. (1968) “Approximating discrete Clause Constructors in Inductive Logic Programming for
probability distributions with dependence trees”. IEEE Semantic Parsing”, Proceedings of the 12th European
Transactions on Information Theory, 1968, IT-14(3), Conference on Machine Learning (ECML- 2001),
462–467. Freiburg, Germany, pages 466–477.
[7] Krovetz, R., & Croft, W. B. (1992) “Lexical ambiguity [21] Drouin Patrick. (2004). “Detection of Domain Specific
and information retrieval”, ACM Transactions on Terminology Using Corpora Comparison” Proceedings
Information Systems, 10, 1992, pp. 115–141 of the Fourth International Conference on Language
Resources and Evaluation (LREC), Lisbon, Portugal
[8] Salton, G., & McGill, M. (1995) “Introduction to
Modern Information Retrieval” McGraw-Hill, New [22] J. M. Zelle and R. J. Mooney, (1993), “Learning
York., 1995 semantic grammars with constructive inductive logic
programming”, in: Proceedings of the 11th National
[9] Maron, M. E. & Kuhns, J. L. (1997) “On relevance,
Conference on Artificial Intelligence (AAAI Press/MIT
probabilistic indexing, and information retrieval” Journal
Press, Washington, D.C.) , pp. 817–822.
of the ACM, 1997, 7, 216–244.
[10] Losee, R. M. (1988) “Parameter estimation for [23] Khoo Christopher, Chan Syin, Niu Yun, (2002) “The
Many Facets of the Cause-Effect Relation”, The
probabilistic document retrieval models”. Journal of the
6. IEEE- International Conference on Information and Emerging Technologies-
IEE-ICIET,
Karachi, Pakistan
Semantics of Relationships. Kluwer Academic Press. pp.
51-70
[24] Gómez-Pérez Asunción, F. Mariano, C. Oscar, (2004)
“Ontological Engineering: with examples from the areas
of Knowledge Management”, e-Commerce and the
Semantic Web. Springer