Report_Internships

1
Development algorithms for analysing energy consumption for
Smart Grid
Final Report of “External Internships in CITSEM”
Date: 12/01/2015
Author: Marta de la Cruz Martos
Tutor: Pedro Castillejo
Group of Work: GRyS
Resumen
En el documento expuesto se va a explicar el desarrollo de las prácticas realizadas en el Centro de Investigación
en Tecnologías Software y Sistemas Multimedia para la Sostenibilidad (CITSEM) dentro del Grupo de Redes
y Servicios de Próxima Generación (GRyS). Estas prácticas están enmarcadas en el proyecto I3RES cuyo
objetivo es integrar las energías renovables en la red distribuida mediante la incorporación de inteligencia
artificial. Dentro de este proyecto, la realización de estas prácticas se ha centrado en el desarrollo de algoritmos
para analizar el consumo energético y realizar una clasificación de usuarios.
Abstract
This document explains the development of the internships at the Centre for Research in Technology and
Multimedia Software Systems for Sustainability (CITSEM) within the group of Next-Generation Networks and
Services (GRyS). This internships are framed in the project I3RES which aims to integrate renewable energy in
the distributed network by incorporating artificial intelligent. Within this project, the internships has focused on
developing algorithms for analysing energy consumption of the smart grid and to classify users.

1. Introduction
The main objective of these internships is to develop algorithms in order to analyse the energy consumption for
the Smart Grid. A Smart Grid is an improvement from the mains which integrates intelligent solutions to achieve
a more efficient management and optimize the production and distribution of electricity. To make it possible, a
European project called i3RES (ICT based Intelligent management of Integrated RES for the Smart Grid optimal
operation) is being developing. A description of it can be read out in the project website referenced in [1]. The
main objective is cited below:
“Recognising the need, within the energy industry, to optimize the integration of renewable energy sources and
new consumer energy needs in connection with socio-economic challenges, I3RES aims to integrate renewable
energy sources in the distribution grid by incorporating intelligence at three different levels: in the integration
of Renewable Energy Sources (RES) and the development of control and management mechanisms that reduce
the impact of its intermittency; in the facilitation of the participation of all actors in the electricity market; and
in the overall operation of the network.
I3RES main goal is to develop a management tool for the distribution grid underpinned by:
1. A monitoring system that integrates information from already installed systems (e.g. SCADA, EMS and
smart meters);
2. Energy production forecasting and network management algorithms that assist the distribution company
in the management of massively distributed RES production and large scale RES production within the
distribution network;
3. Data mining and artificial intelligence to analyse consumers' energy demand and production in the
distribution grid.”
In order to develop the internships, next objectives were set:
1. To make a study about the technologies related to the project.
2. To participate in the specification and design of software components.
3. To develop and implement algorithms.
4. To participate in the process of debugging and prototyping of components.
5. To prepare reports related to the tasks previously exposed.
This document has the goal to show the work developed in the internships at CITSEM. First, a study of the state
of art will be exposed. This study is divided in two parts: first part of the study is about distributed systems
because the project is develop over it; second part is about database technologies which is the main part of the
internships. Then, the developing part will be explain. After that, the results obtained will be shown. To finalize
the document, the conclusions will be displayed, followed by future work.

2. State of Art
2.1 Distributed Systems
2.1.1 Introduction
This part of the document is the result of the study of art about distributed systems. First of all, a distributed
system will be defined and highlight some of its features, properties and challenges. Also, it includes a summary
table with the advantages and disadvantages respect to other systems.
Bellow the study of art about distributed system, communication protocols will be presented. First a brief
introduction about the evolution of distributed systems will be made and then the study will be focused on each
of the different platforms and models. To conclude this section, a summary table will be made with the most
important aspects of each protocol.
Next section is about distributed systems platforms where the study will be focused on each architectures.
To finalize a brief conclusion will be exposed recapping the main points of distributed systems.
2.1.2 Distributed Systems
Neuman define it in [2] as:
“A distributed system is a collection of computers, connected by a computer network working together to
collectively implement some minimal set of services.”
o Properties and Challenges
Distributed systems should attempt to provide some properties, Coulouris mentions the following challenges in
[3]:
Heterogeneity: it applies of networks, computer hardware, operating systems, and programming languages and
so on.
Openness: a computer system is open if the system can be extended through different ways. The open or closes
terms can be applied to both hardware components (for example peripherals) to software components (for
example add features to the operating system). The basic features of distributed system are: the interface are
made public, open distributed systems can be constructed from heterogeneous hardware and software and it are
based on the provision of a uniform communication between processes and published interfaces to access shared
resources mechanism.
Security: Many of the information resources that are made available and maintained in distributed systems have
a high intrinsic value to their users. Their security is therefore of considerable importance. Security for
information resources has three components: confidentiality (protection against disclosure to unauthorized
individuals), integrity (protection against alteration or corruption), and availability (protection against
interference with the means to access the resources).
Scalability: A system is scalable if can handle the addition of users and resources without suﬀering a noticeable
loss of performance or increase in administrative complexity. [2]
Failure handling: information must be store in a set of machines (redundancy). If a machine fails, another can
replace their functions. Another solution can be recovery software: design software that is able to recover (roll-
back) the state of permanent data before the failure occurred.

Concurrency: when we have multiple processes on a single machine we say that are running concurrently. If
your computer is equipped with a single core processor, concurrency occurs weaving processers (apparently).
If the computer has N processors, concurrency is strictly produced with N processes.
Transparency: entire system should work similarly in all points of the network. It is the concealment from the
user and the application programmer of the separation of the components of a distributed system. The forms of
transparency are, for example, access transparency, location transparency, migration transparency, replication
transparency, failure transparency, concurrency transparency and so on.
Quality of Services (QoS): the main non-functional properties of systems that affect the QoS experienced by
clients and users are reliability, security and performance. Another important aspect of service quality are
adaptability to meet changing system configurations and resource availability.
o Advantages and Disadvantages
Table 1: Advantages and Disadvantages of distributed systems [4]
ADVANTAGES DISADVANTAGES
-Processors are more powerful and less costs:
Development stations with more capabilities,
stations to meet the needs of the users, use of new
interfaces.
-Advances in Communications Technology:
availability of elements of Communication,
development new techniques.
-Sharing of Resources: devices (Hardware) and
programs (Software).
-Efficiency and Flexibility: quick Response,
concurrent processes (on multiple computers),
use of distributed processing techniques
-Availability and Reliability: system not prone to
failure (If a component doesn´t work not affect
the availability of the system), more services that
enhance the functionality.
-Modular Growth: it is inherent to the growth,
inclusion fast of new resources, the current
resources do not affect.
-Requirements for greater
processing controls.
-Speed of propagation of
information (Very slow at
times).
-Services of data
replication and services
with a chance of failure.
-Greater access controls
and process (Commit).
-Management more
complex.
-Costs.
2.1.3 Distributed Systems Communication Protocols.
Hurtado Jara describe in [5] that the first platforms were based on a single central computer which processes all
information (central processing, Host). If the processing load increases, all hardware of the central computer
had to be replaced (it was very expensive). Also, the new graphical user interfaces (GUIs) leading to a large
increase in traffic and it could collapse.

Another centralized model was to interconnect multiple computers which behave as a servers connected to a
local area (group of servers). In this model, the problem of saturation appears, for example when several users
request a very large file (loss of the speed of transmission).
The model that predominated today is the CLIENT-SERVER (client is a machine that request a certain service,
and server is who provides the service). Decentralizes processing and resources of each of the services and the
displaying of the GUI.
Protocols are list below:
o Socket
o RM-ODP
o CORBA
o RMI
o DCOM
o Servlets
o Java Beans
o SOAP
Main protocols are DCOM, CORBA AND RMI. Next table shows a comparison between them.
Table 2: Comparison between DCOM, CORBA and RMI [6]
DCOM CORBA RMI
Number of
interfaces to
objects.
-multiple interfaces to
objects
-method: QueryInterface()
Identify Interfaces Interface ID or Class ID Name or Implementation
Repository (for servers).
Name or URL in Registry
(for servers).
Methods for
complex shapes
crossing the
interface.
Any public java object can
be passed as a parameter
through the process.
Call to an object IUnknown each interface inherits from
CORBA.Object
All object servers do
java.rmi.Remote
Identification
remote server
objects
Through interface pointer Through the references of
objects
With ObjID, which served
while working with the
object.
Generate
references to
remote objects in
the server
In the network protocol,
using the Object Exported
In the network protocol,
using the Object Adapter
Making a call to the method
UnicastRemoteObject.expo
rtObject(this)
an interface to support multiple inheritance
of multiple interfaces
The complex forms must be declared in IDL.
concerningtheinterface

Tasks (declaration
of objects,
establishment of
the skeleton,…)
Server program or dynamic
handling by the COM
system.
The constructor implicitly
performs these tasks.
The RMIRegistry makes
common tasks.
Activation of
Objects
CoCreateInstance Object is attached to a name
or a commercial service
Lookup() n the url of the
remote obict server
Requests for
objects
The client needs a pointer
to the interface.
Representation of
the object name
From Registry From Implementation
Repository
From RMIRegistry
Passing
parameters
between client and
server
Defined in Interface
Definition (can be passed
by value or by reference)
All interface types by
reference and the rest by
value.
Objects made using
java.rmi.Remote interfaces
are passed as a remote
reference. Others are
passed by value.
References trash Distributed garbage
collection over the network
through acute mechanisms of
sound
No garbage collection takes Garbage Collection for piped
through mechanisms
included in the packaging
JVM.
Locate and
activate an
implementation of
an object
Made by SCM (Service
Control Manager)
Localization by ORB
(Object Request Broker)
and activation by OA
(Object Adapter)
Made by JVM (Java Virtual
Machine)
How to call the
stub
Client  proxy
Server  stub
Encoding objects Multiple programming
languages
Multiple programming
languages if you have
libraries ORB that you can
use for coding that
language
Only Java
Where is the type
of information
content
Type Library Interface Repository The object contain this
information, it can be
required with the methods
Reflection and
Introspection
Return value HResult Exception Objects Exceptions
Need a reference to the object
Client  proxy or stub
Server  skeleton

They may not
work on all
platforms
It will run on any platform
that has implemented the
COM service for that
platform
It will run on any platform
that has implemented the
CORBA ORB on that
platform
It will work on any
platform that has
implemented the JVM for
that platform
each uses a
different protocol
ORPC (Object Remote
Procesadure Call)
IIOP (Internet Inter-ORB
protocol)
JRMP (Java Remote
Method Protocol)
2.1.4 Distributed Systems Architectures
In this section, the different software architectures will be studied. The study will be focus on a specific
architecture: SOA. The most common models of SOA will be studied, focusing on the ESB model, which will
present its features and product. The different ESB platforms will be cited.
Software architecture is defined in [7] as:
“The word software architecture intuitively denotes the high level structures of a software system. It can be
defined as the set of structures needed to reason about the software system, which comprise the software
elements, the relations between them, and the properties of both elements and relations.
The term software architecture also denotes the set of practices used to select, define or design a software
architecture.
Finally, the term often denotes the documentation of a system's "software architecture". Documenting software
architecture facilitates communication between stakeholders, captures early decisions about the high-level
design, and allows reuse of design components between projects.”
The most important software architectures are [8]:
o Multiprocessor Architecture
o Client-Server Architecture
o Distributed Computing Interorganizational
For reasons of safety and interoperability, distributed computing has been implemented at the organizational
level. An organization has a multiple servers and shares its load between them. As all servers are within the
same organization, local standards and operational processes can be applied.
Currently most recent models are available and allow interorganizational distributed computing rather than
intraorganizational distributed computing.
Computing peer-to-peer is based on calculations that are performed on individual nodes of the network.
Service Oriented Systems are related to distributed services instead of distributed objects, and also relate to
XML based data exchange standards.
Service Oriented Architecture is defined in [9] as:
“A service-oriented architecture (SOA) is the underlying structure supporting communications between
services. SOA defines how two computing entities, such as programs, interact in such a way as to enable one
entity to perform a unit of work on behalf of another entity. Service interactions are defined using a description

language. Each interaction is self-contained and loosely coupled, so that each interaction is independent of any
other interaction.”
In [10] is said that the desired functionality in SOA is decomposed software process in services which can be
distributed in different nodes which are connected through the network and in combination. Services are basic
units of functionality which operate independently. Combining collections of small modules (services)
necessary applications are achieved for the relevant business processes. These modules can be used by users
within the organization or outside it. Savings in development effort is a achieved because the common
functionalities to different applications are reused. Also, the integration between organizations is favoured
because the homogenization of appearance and the level and type of input data for validation of users is
achieved.
Collaboration between services is defined in [10] as:
“Collaboration between services is to determine the sequence of operations that must take place in the
interaction between clients and servers. The sequence must respect the established order for it to be valid and
in order for this to be feasible, a protocol coordination be precisely commissioned to describe the set of valid
sequences is defined.”
Also, two models of collaboration between services are defined: orchestration and choreography.
 The orchestration model: “is based on the existence of a centralized control mechanism that is
responsible for directing the activities, with each interaction between services. Allows the definition of
a model of interaction but only from the point of view of the driver. The orchestration defines the
behaviour and how to carry it out, and all events are monitored centrally.”
 The choreography model: “describes the behaviour to be observed between the interacting parties.
Each of the organizations involved in this model develops independently the role you want to play in
the collaboration, the only condition is to respect the "global contract" which is described
choreography. The execution and control is the responsibility of the participants.”
Technological SOA components are:
1. Enterprise Service Bus (ESB): where the services are deployed and running.
2. Universal Description, Discovery and Integration (UDDI): is defined in [11] as:
“Universal Description, Discovery and Integration (UDDI, pronounced Yu-diː) is a platform-
independent, Extensible Markup Language (XML)-based registry by which businesses worldwide
can list themselves on the Internet, and a mechanism to register and locate web service applications.
UDDI is an open industry initiative, sponsored by the Organization for the Advancement of
Structured Information Standards (OASIS), for enabling businesses to publish service listings and
discover each other, and to define how the services or software applications interact over the
Internet.”
3. Business Process Management (BPM): component for the orchestration of services in business
processes.
4. Business Activity Monitoring (BAM): for visualization and monitoring of business activities.

2.1.5 Enterprise Service Bus.
ESB is defined in [12] as:
“An enterprise service bus (ESB) is a software architecture model used for designing and implementing
communication between mutually interacting software applications in a service-oriented architecture (SOA).
As a software architectural model for distributed computing it is a specialty variant of the more general client
server model and promotes agility and flexibility with regards to communication between applications. Its
primary use is in enterprise application integration (EAI) of heterogeneous and complex landscapes.”
ESB’s functions are defined in [12]:
 Invocation: support for synchronous and asynchronous transport protocols, service mapping
(locating and binding)
 Routing: addressability, static/deterministic routing, content-based routing, rules-based routing,
policy-based routing
 Mediation: adapters, protocol transformation, service mapping
 Messaging: message-processing, message transformation and message enhancement
 Process choreography: implementation of complex business processes
 Service orchestration: coordination of multiple implementation services exposed as a single,
aggregate service
 Complex event processing: event-interpretation, correlation, pattern-matching
 Other quality of service: security (encryption and signing), reliable delivery, transaction
management
 Management: monitoring, audit, logging, metering, admin console, BAM (BAM is not a
management capability in other words the ESB doesn’t react to a specific threshold. It is a business
service capability surfaced to end users. )
 Agnosticism: general agnosticism to operating-systems and programming-languages; for example,
it should enable interoperability between Java and .NET applications
 Protocol Conversion: comprehensive support for topical communication protocols service
standards
 Message Exchange Patterns: support for various MEPs (Message Exchange Patterns) (for
example: synchronous request/response, asynchronous request/response, send-and-forget,
publish/subscribe)
 Adapters: adapters for supporting integration with legacy systems, possibly based on standards
such as JCA
 Security: a standardized security-model to authorize, authenticate and audit use of the ESB
 Transformation: facilitation of the transformation of data formats and values, including
transformation services (often via XSLT or XQuery) between the formats of the sending application
and the receiving application
 Validation: validation against schemas for sending and receiving messages
 Governance: the ability to apply business rules uniformly
 Enrichment: enriching messages from other sources
 Split and Merge: the splitting and combining of multiple messages and the handling of exceptions
 Abstraction: the provision of a unified abstraction across multiple layers
 Routing and Transformation: routing or transforming messages conditionally, based on a non-
centralized policy (without the need for a central rules-engine)

 Queuing and staging: queuing, holding messages if applications temporarily become unavailable
or work at different speeds
 Commodity Services: provisioning of commonly used functionality as shared services depending
on context
Table 3 shows benefits and disadvantages from ESB:
Table 3: Benefits and Disadvantages of ESB [12]
o Increased flexibility; easier to change as
requirements change
o Scales from point-solutions to enterprise-wide
deployment (distributed bus)
o More configuration rather than integration coding
o No central rules-engine, no central broker
o Incremental patching with zero down-time;
enterprise becomes "refactorable"
o Increased overhead
o Slower
communication speed,
especially for those
already compatible
services
ESB products are classified in [12]:
 Commercial:
o Adeptia ESB Suite
o webmethods Enterprise Service Bus (SoftwareAG)
o (TIBCO) ActiveMatrix™ BusinessWorks
o IBM WebSphere ESB
o IBM WebSphere Message Broker
o Microsoft BizTalk Server
o Neudesic Neuron ESB
o Windows Azure Service Bus
o Oracle Enterprise Service Bus (BEA Logic)
o Progress Sonic ESB (acquired by Trilogy)
o Red Hat JBoss Fuse
o IONA (acquired by Progress)
 Open Source:
o Apache ServiceMixApache Synapse
o JBoss ESB
o NetKernel
o Petals ESB
o Spring Integration
o Open ESB
o WSO2 ESB
o Mule
o UltraESB
o Red Hat Fuse ESB (based on Apache ServiceMix)
o Zato. ESB and application server. Open-source. In Python.

2.2 Databases
2.2.1 Introduction
Data storage has been necessary from immemorial times. Writing on stone, parchment, paper… are the first
examples of data storage. When computer and electronics appeared, magnetic tapes (1950) and diskettes (1960)
were the first steps in computer data storage.
Currently, databases are an indispensable tool in the information society. Its permits storage data and also, its
help to organised, protect and management data.
2.2.2 Historical Evolution
As is said in [13], techniques for data storage and processing have evolved since the first element appeared.
This element was punched cards and it was used for recording U.S. census data. It was invented by Herman
Hollerith and mechanical systems were used to process the cards and tabulated results. Then, this punched cards
were used for entering data into computers.
o 1950s and early 1960s: “Magnetic tapes were developed for data storage. Data processing tasks such
as payroll were automated, with data stored on tapes. Processing of data consisted of reading data
from one or more tapes and writing data to a new tape. Data could also be input from punched card
decks, and output to printers. […]
Tapes (and card decks) could be read only sequentially, and data sizeswere much larger than main
memory; thus, data processing programs were forced to process data in a particular order, by reading
and merging data from tapes and card decks.” [13]
o Late 1960s and 1970s: “Widespread use of hard disks in the late 1960s changed the scenario for data
processing greatly, since hard disks allowed direct access to data. The position of data on disk was
immaterial, since any location on disk could be accessed in just tens of milliseconds. Data were thus
freed from the tyranny of sequentially. With disks, network and hierarchical databases could be created
that allowed data structures such as lists and trees to be stored on disk. Programmers could construct
and manipulate these data structures.
A landmark paper by Codd [1970] defined the relational model and nonprocedural ways of querying
data in the relational model, and relational databases were born. The simplicity of the relational model
and the possibility of hiding implementation details completely from the programmer were enticing
indeed. Codd later won the prestigious Association of Computing Machinery Turing Award for his
work.” [13]
1950s 1960s 1970s 1980s 1990s 2000s
Magnetic
tapes
Hard Disks
Network and
Hierarchical
databases
Relational
Databases
SQL
WWW
high transaction-
processing rates
Parallel DB XML
XQuery
Distributed DB

o 1980s: “Although academically interesting, the relational model was not used in practice initially,
because of its perceived performance disadvantages; relational databases could not match the
performance of existing network and hierarchical databases. That changed with System R, a ground
breaking project at IBM Research that developed techniques for the construction of an efficient
relational database system. Excellent overviews of System R are provided by Astrahan et al. [1976] and
Chamberlin et al. [1981]. The fully functional System R prototype led to IBM’s first relational database
product, SQL/DS. At the same time, the Ingres system was being developed at the University of
California at Berkeley. It led to a commercial product of the same name. Initial commercial relational
database systems, such as IBM DB2, Oracle, Ingres, and DEC Rdb, played a major role in advancing
techniques for efficient processing of declarative queries. By the early 1980s, relational databases had
become competitive with network and hierarchical database systems even in the area of performance.
Relational databases were so easy to use that they eventually replaced network and hierarchical
databases; programmers using such databases were forced to deal with many low-level implementation
details, and had to code their queries in a procedural fashion. Most importantly, they had to keep
efficiency in mind when designing their programs, which involved a lot of effort. In contrast, in a
relational database, almost all these low-level tasks are carried out automatically by the database,
leaving the programmer free to work at a logical level. Since attaining dominance in the
1980s, the relational model has reigned supreme among data models.
The 1980s also saw much research on parallel and distributed databases, as well as initial work on
object-oriented databases.” [13]
o Early 1990s: “The SQL language was designed primarily for decision support applications, which are
query-intensive, yet the mainstay of databases in the 1980s was transaction-processing applications,
which are update-intensive. Decision support and querying re-emerged as a major application area for
databases. Tools for analysing large amounts of data saw large growths in usage.
Many database vendors introduced parallel database products in this period. Database vendors also
began to add object-relational support to their databases.” [13]
o 1990s: “The major event of the 1990s was the explosive growth of the World Wide Web. Databases
were deployed much more extensively than ever before. Database systems now had to support very high
transaction-processing rates, as well as very high reliability and 24 × 7 availability (availability 24
hours a day, 7 days a week, meaning no downtime for scheduled maintenance activities). Database
systems also had to support Web interfaces to data.” [13]
o 2000s: The first half of the 2000s saw the emerging of XML and the associated query language XQuery
as a new database technology. Although XML is widely used for data exchange, as well as for storing
certain complex data types, relational databases still form the core of a vast majority of large-scale
database applications. In this time period we have also witnessed the growth in “autonomic-
computing/auto-admin” techniques for minimizing system administration effort. This period also saw
a significant growth in use of open-source database systems, particularly PostgreSQL and MySQL.
The latter part of the decade has seen growth in specialized databases for data analysis, in particular
column-stores, which in effect store each column of a table as a separate array, and highly parallel
database systems designed for analysis of very large data sets. Several novel distributed data-storage
systems have been built to handle the data management requirements of very large Web sites such as
Amazon, Facebook, Google, Microsoft and Yahoo!, and some of these are now offered as Web services
that can be used by application developers. There has also been substantial work on management and
analysis of streaming data, such as stock-market ticker data or computer network monitoring data.
Data-mining techniques are now widely deployed; example applications include Web-based product-
recommendation systems and automatic placement of relevant advertisements on Web pages.” [13]

2.2.3 Databases. Advantages and Disadvantages
Table 4: Advantages and Disadvantages of DB [14]
o Data independence regarding treatments
o Consistency: Not uncontrolled data redundancy
o Availability: data are not owned by users
o Greater data accessibility and responsiveness
o Greater value of the information
o Better and more standardized documentation of
information
o Greater efficiency in the collection, validation and data
entry
o Reduction of storage space
o Highest level of concurrency
o Services backup and recovery
o Integration in many applications
o The implementation
of a database system
can be very
expensive, both
physical and logical
equipment
o Specialized staff
o Lack of short-term
profitability
o Real absence of
standards
o More impact of failures
2.2.4 Databases and Management DB
As is said in [13], a DBMS is:
“A database-management system (DBMS) is a collection of interrelated data an a set of programs to access
those data. The collection of data, usually referred to as the database, contains information relevant to an
enterprise. The primary goal of a DBMS is to provide a way to store and retrieve database information that is
both convenient and efficient.
Database systems are designed to manage large bodies of information. Management of data involves both
defining structures for storage of information and providing mechanisms for the manipulation of information.
In addition, the database system must ensure the safety of the information stored, despite system crashes or
attempts at unauthorized access. If data are to be shared among several users, the system must avoid possible
anomalous results.”
DBMS are used to allow users to access and manipulate the database and to provide administrators the tools to
perform maintenance and data management. Some features of it are [15]:
o Control data redundancy: it achieves a minimum amount of storage space avoiding duplication of
information.
o Data sharing: data can be shared among many users simultaneously allowing maximum efficiency.
o Maintaining the integrity: it guarantees the accuracy or correctness of the information contained in a
database.
o Support for transaction control and fault recovery: transactions are controlled to not alter the
integrity of the database. Failover is the ability of a DBMS to retrieve information that is lost during a
software or hardware failure.
o Data independence: In DBMS systems, application programs do not need to know the organization of
data on the hard disk. This is completely independent of it.
o Security: data availability may be restricted to certain users.
o Speed: DBMS modern systems have high response and process speeds.
o Hardware independence: most of DBMS systems are available for installation on multiple hardware
platforms.

3. Developing
The work has been divided in two parts. The first part related to databases and the second part related to coding
in JAVA.
Along the first month, databases and MySQL have been learning. To make it possible, an open source called
XAMPP, referenced in [15], which is an Apache distribution containing MySQL, was installed. Then, a jdbc
connector, referenced in [16], was installed too.
When the environment was ready, the book referenced in [15] was followed in order to learn about MySQL.
There are a lot of examples in this book. Most of them were made. All of these are shown in “Annex A: diary
of internships”.
Once MySQL was learning, three weeks were dedicated to developing a GUI for the project i3RES. This GUI
is going to be for internal use so that, some screens of the real GUI were provided. Next figure shows some of
this.
After that, a database with real data was charged in order to create statistics graphics. This database contains
data consumption of each customer and generates a clustering of costumers in order to create user groups divided
according to their consumption. The graphics generated show customers number in each group and minimum,
average and maximum consumption of each group.

4. Results
As a result of the work developed among the internships a program was successful. The code is attached in
“Annex B: Code” and javadoc in “Annex C”. The class diagram of the main class is shown in next figure.
The class I3RESGUI generates a GUI (Graphic User Interface) for internal use. Main screen is shown in next
figure. It offers three options. First option goes to the next screen (it shows in next figure) and it permits load
data from DSO in database. As the program is for testing, this option was implemented just the first time that
the GUI was running. This data were read from 172 csv files with about 23000 entries for each file. The data
were load in the database and it costs about 43 hours because it was loaded line by line (AnnexBPackage
loadDataClass LoadDataMethod load()). Also, this screen shows another option which is “Customer
Profiling” and it goes to the next screen (it shows in next figure). Two options are available in this screen. The
first option shows a dialog for choosing statistic graphics which will be explained later. Next option, “Perform
new profiling” goes to the next screen where there are three options to choose. These options permit choose
between different profiling (seasonal, daily or hourly). If you click any option, another screen is shown with
three options: Define Parameters, Show Consumption graphs and Perform Profiling. The first option shows a
dialog for defining what characteristic are required for performing profiles. Next options are not implemented
yet.
Others two options in the main screen are not implemented yet because it is not necessary for the internship
development. When you click options that are not implemented a dialog (“We’re still working on this page”)
appears.

In the option “Show Current Profiles”, six different options of graphics are displayed. Next figures shows these
graphics. The first three show the number of customers in each group daily, monthly and hourly. The next
graphics show the consumption minimum, average and maximum of each group daily, monthly and hourly.

The class diagram of the other package is shown in next figure.

The packages are:
o I3RESconnector:
This package contains three classes:
o iSQLconnection: an interface with the necessary methods for interacting with the database
o MySQL: class which implements the interface iSQLconnection and connects with the database.
o Querys: class that implements the necessary querys to the database to generate the graphics.
o I3RESloadData:
This package contains one class:
o LoadData: this class has the method for loading the data into the database and others methods
to create different tables in runtime.
o I3RESGraphics:
o Clustering: this class generates the graphics called clustering.
o Dataset: this class obtains the data to generate the graphics called thresholds.
o Scatter: this class generates the graphics called thresholds.
o I3RESInterface:
o DialogDefineParameters: this class generates a dialog for recapping the data to create profiles.
o MyException: this class is used for catching exceptions.
o I3RESGUI: this is the main class and it has been explained before.

5. Conclusions
Related to internships:
1. I achieve the knowledge necessary for the development by studying the state of the art and developing
the internships. In addition, I learn the formal drafting of documents.
2. Thanks to the development part, I have been able to learn more about the coding language JAVA and
the technology MySQL.
3. I have reinforced the design and specification part before coding.
4. Another goal achieved in this part is to be able to deploy a complex application (previous design). I
have worked with a real development with the corresponding error code, software, and so on, so I have
been integrated in a real project with real problems.
5. Currently, I am going about to finish my degree and this internships has helped me to get experience
for future works.
Therefore, all the objectives have been achieved.
6. Future work
The line following this project is very extensive. With the work done so far can continue from several
perspectives:
First step is to finalize the application functionality. Once this is completed, the application could perform a
clustering of users with parameters entered directly into the database. Currently, the database has five static
profiles for testing. These parameters would be load into a temporally table.
Another step is to add new consumption graphics. For example, dynamic graphics which show the consumption
of each group as a function of time.
A part of the application, another future work is to implement more specific and complex cluster algorithms.

7. References
[1] Seventh Framework Programme, “i3res,” 2013. [Online]. Available: http://www.i3res.eu/v1/. [Accessed 28 12
2014].
[2] C. N. B., “university of Southerm California,” 1994. [Online]. Available:
http://clifford.neuman.name/papers/pdf/94--_scale-dist-sys-neuman-readings-dcs.pdf. [Accessed 1 3 2014].
[3] D. J. K. T. Coulouris George, Distributed Systems: Concepts and Design, 3th ed., Addison Wesley, 2000.
[4] H. C. E. M. A. A. De Dios Gómez Sebastian, “Blogspot,” 2009. [Online]. Available:
http://sdequipo2.blogspot.com.es/2009/04/ventajas-y-desventajas-del-sistema.html. [Accessed 1 3 2014].
[5] O. Hurtado Jara, “Monografías,” 2006. [Online]. Available: http://www.monografias.com/trabajos16/sistemas-
distribuidos/sistemas-distribuidos.shtml#EVOL. [Accessed 1 3 2014].
[6] Annonymous, “Docstoc,” 2012. [Online]. Available:
http://www.docstoc.com/docs/132521394/Comparaci%EF%BF%BDn-entre-JavaRMI-CORBA-y-DCOM.
[Accessed 1 3 2014].
[7] W. Contributors, “Wikipedia,” 2014. [Online]. Available: http://en.wikipedia.org/wiki/Software_architecture.
[Accessed 11 3 2014].
[8] Anonymous, “Wikispaces,” 2012. [Online]. Available: http://sistemasdistribuidos2012-
caece.wikispaces.com/Arquitectura+de+Sistemas+Distribuidos+-+Parte+II. [Accessed 14 3 2014].
[9] M. Rouse, “Techtarget,” 2008. [Online]. Available: http://searchsoa.techtarget.com/definition/service-oriented-
architecture. [Accessed 15 3 2014].
[10] Anonymous, “Oposicionestic,” 2012. [Online]. Available:
http://oposicionestic.blogspot.com.es/2012/08/arquitectura-soa-orientada-servicios.html. [Accessed 15 3 2014].
[11] W. Contributions, “Wikipedia,” 2014. [Online]. Available:
http://en.wikipedia.org/w/index.php?title=Universal_Description_Discovery_and_Integration&oldid=588254888.
[Accessed 16 3 2014].
[12] W. Contributions, “Wikipedia,” 2014. [Online]. Available:
http://en.wikipedia.org/w/index.php?title=Enterprise_service_bus&oldid=599460329. [Accessed 17 3 2014].
[13] H. F. S. S. Abraham Siberschatz, Database System Concepts, 4º Edición ed., Mc Graw Hill, 2011.
[14] U. d. Sevilla, “lsi,” [Online]. Available: http://www.lsi.us.es/docencia/get.php?id=5396. [Accessed 04 01 2015].
[15] «Estructura y Programación,» [En línea]. Available:
http://www.estructurayprogramacion.com/materias/administracion-de-base-de-datos/caracter%C3%ADsticas-
del-dbms/. [Último acceso: 05 01 2015].
[16] A. Friends, 2014. [Online]. Available: https://www.apachefriends.org/index.html. [Accessed 28 12 2014].
[17] Oracle, “MySQL,” 2014. [Online]. Available: http://dev.mysql.com/downloads/connector/j/. [Accessed 28 12
2014].
[18] L. Beighley, Head first SQL, O'Reilly, 2007.

Report_Internships

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Report_Internships

Semelhante a Report_Internships (20)

Report_Internships