In this tutorial we present our work on spreadsheet engineering. We start by presenting a model-driven spreadsheet development environment (MDSDE), where a domain specific spreadsheet model is used to guide end-users in introducing correct data. The business logic of spreadsheet data is modeled via domain specific ClassSheet models. End users can not only (traditionally) edit/update the spreadsheet data, but also to evolve the model and/or the data. Our MDSDE automatically guarantees model/instance synchronization after a model/instance evolution.
ICT role in 21st century education and its challenges
Summer School DSL 2013 - SpreadSheet Engineering
1. Jácome Cunha1,2
, João P.Fernandes1,3
,
João Saraiva1
1
HASLab / INESC TEC & Universidade do Minho
2
ESTGF, Instituto Politécnico do Porto
3
(rel)ease, Univ. da Beira Interior
Portugal
DSL 2013
Spreadsheet EngineeringSpreadsheet Engineering
2. 2
This TutorialThis Tutorial
●Domain Specific Languages
●Visual Modeling Domain Specific Languages
●Embedding Domain Specific Language
●Model-Driven Engineering
●Software Evolution
●Bidirectional Software Evolution
●Empirical Studies
3. 3
This TutorialThis Tutorial
All defined in the context of Spreadsheets
And, implemented in the Functional Programing Language
Haskell!
4. 4
This Tutorial: PlanThis Tutorial: Plan
Part I
● Motivation
● Spreadsheet Analysis
Data Mining Techniques
● Models for Spreadsheets
ClassSheets
Embedded ClassSheet Models
● Model-driven Spreadsheets
5. 5
Image taken from http://www.flickr.com/photos/cosmosfan/2414002070/Image taken from http://www.flickr.com/photos/cosmosfan/2414002070/
Spreadsheets are widely usedSpreadsheets are widely usedSpreadsheets are widely usedSpreadsheets are widely used
6. 6
Why do Spreadsheets matter?Why do Spreadsheets matter?
●Probably the Biggest Programming Language!
●Probably the Biggest Functional Programming Language!
●Probably the Biggest Domain Specific Language!
●Probably the Biggest software system!
●Probably the biggest database system!
8. 8
Why do Spreadsheets matter?Why do Spreadsheets matter?
Financial intelligence firm CODA
reports that 95% of all U.S. Firms
use spreadsheets for financial
reporting
Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R.
Panko and Nicholas Ordway, 2008
9. 9
Why do Spreadsheets matter?Why do Spreadsheets matter?
In 2004, RevenueRecognition.com
(now Softtrax) had the
International Data Corporation
interview 118 business leaders.
IDC found that 85% were using
spreadsheets in financial reporting
and forecasting.
Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R.
Panko and Nicholas Ordway, 2008
10. 10
Why do Spreadsheets matter?Why do Spreadsheets matter?
50% of all spreadsheets are the
basis for decisions.
Supporting professional spreadsheet users by generating leveled
dataflow diagrams, Felienne Hermans, Martin Pinzger and Arie van
Deursen, 2011
11. 11
Why do Spreadsheets matter?Why do Spreadsheets matter?
They are the programming language of
choice by non-professional programmers,
a.k.a. end-users
In the U.S. alone, the number of end-
user programmers is conservatively
estimated at 11 million,
compared to only 2.75 million other,
professional programmers
Estimating the numbers of end users and end user programmers,
Christopher Scaffidi, Mary Shaw, and Brad Myers, 2005
12. 12
Why do Spreadsheets matter?Why do Spreadsheets matter?
Why are they so popular?
Which characteristics make them so
successful?
First “empirical” study: fill in the
inquiry!
13. 13
But, as a programming language...But, as a programming language...
Exercise: Write a program that sums a
list of integer values.
sum :: [Int] -> Int
sum [] = 0
sum (h:t) = h + sum t
14. 14
In fact, spreadsheets lack:In fact, spreadsheets lack:
● Abstraction
● Encapsulation
● Type system
● Testing
● IDE
● ...
15. 15
And the consequences may be...And the consequences may be...
http://www.eusprig.org/stories.htm
Economy losses of $10 billion/year!
18. 18
Functional DependenciesFunctional Dependencies
Informally, a functional dependency
between a column A and another column
B means that the values in column A
determine the values in column B, that
is, there are no two rows in the
spreadsheet that have the same value
in column A but differ in their values
in column B.
22. 22
Functional DependenciesFunctional Dependencies
● We compute the business logic from the
data, by inferring Fds.
● They are the building blocks inferring
models for (legacy) spreadsheets.
● The better are the FDs we infer, the
better is the model we compute!
24. 24
Functional DependenciesFunctional Dependencies
● Label semantics: often keys are labeled
“code” or “id”
● Label arrangement: we prefer FDs
respecting the order of columns
● Antecedent size: small keys are
preferable
● Ratio: small ratio between keys and non-
keys
● Single value columns: columns always
with the same value appear in too many
FDs
27. 27
Relational ModelRelational Model
● Discovery-based Edit Assistance for
Spreadsheets, Jácome Cunha, João Saraiva, and
Joost Visser. In proceedings of 2009 IEEE
Symposium on Visual Languages and Human-Centric
Computing (VL/HCC 2009).
● From Spreadsheets to Relational Databases
and Back, Jácome Cunha, João Saraiva, and Joost
Visser. In proceedings of the 2009 ACM SIGPLAN
Symposium on Partial Evaluation and Semantics-based
Program Manipulation (PEPM 2009).
30. 30
Spreadsheet AnalysisSpreadsheet Analysis
● Spreadsheet Querying: Rui Pereira (talk at
the students workshop)
● Spreadhseet Smells: Pedro Martins: talk at
the students workshop (not related to
spreadsheets!).
33. 33
Spreadsheet SmellsSpreadsheet Smells
● We have implemented a full catalog of
smells for spreadsheets:
empty cell, pattern finder, reference to
empty cells, multiple operations,
multiple references, conditional
complexity, long calculation chain,
duplicated formulas, innapropriate
intimacy, etc
34. 34
Still...Still...
Around 200 people who thought their only
experience of the London 2012
Olympic Games would be minor heats of
synchronised swimming have
received an unexpected upgrade to the men’s
100m final following an
embarrassing ticketing mistake.
...
Locog said the error occurred in the summer,
between the first and
second round of ticket sales, when a member of
staff made a single
keystroke mistake and entered ‘20,000’ into a
spreadsheet rather than
the correct figure of 10,000 remaining tickets.
The Telegraph, 04 January 2012
37. 37
ClassSheets - Models in SpreadsheetsClassSheets - Models in Spreadsheets
ClassSheets: automatic generation of spreadsheet
applications from object-oriented specifications,
Gregor Engels, Martin Erwig, ASE'05
● ClassSheets are a high-level, object-
oriented formalism to specify the
business logic of spreadsheets
43. 43
I. ClassSheet Model InferenceI. ClassSheet Model Inference
Automatically Inferring ClassSheet Models from Spreadsheets, Jácome Cunha,
Martin Erwig, João Saraiva, VL/HCC'10
Data mining techniques
Database normalization theory
45. 45
Still...Still...
Harvard University economists Carmen Reinhart
and Kenneth Rogoff have acknowledged making
a spreadsheet calculation mistake in a 2010
research paper, “Growth in a Time of Debt”,
which has been widely cited to justify budget-
cutting.
Business Week, 18 April 2013
In a 2010 paper* Carmen Reinhart, now a professor at
Harvard Kennedy School, and Kenneth Rogoff, an
economist at Harvard University...argued that GDP
growth slows to a snail’s pace once government-debt
levels exceed 90% of GDP. The 90% figure quickly
became ammunition in political arguments over
austerity...This week a new piece of research poured fuel
on the fire by calling the 90% finding into question..
The Economist, 17 April 2013
47. 47
● Erwig implemented ClassSheets as a standalone
language.
● A new processor (for ClassSheets) was
developed from scratch:
Embedding ClassSheets in SpreadsheetsEmbedding ClassSheets in Spreadsheets
48. 48
● From a ClassSheet it produces an initial
Spreadsheet with the model embedded to guide
users intorducing correct data.
Embedding ClassSheets in SpreadsheetsEmbedding ClassSheets in Spreadsheets
49. 49
● Embedding DSLs in general purpose
programming languages is a recurring strategy
– systems inherit all the power of the host
language
– implementation effort is much reduced
● We will present the embedding of the
ClassSheet (DSL) model in traditional
spreadsheet systems
Embedding ClassSheets in SpreadsheetsEmbedding ClassSheets in Spreadsheets
50. 50
Embedding ClassSheets in SpreadsheetsEmbedding ClassSheets in Spreadsheets
Embedding and Evolution of Spreadsheet Models in Spreadsheet Systems,
Jácome Cunha, Jorge Mendes, João P. Fernandes, João Saraiva, VL/HCC '11
Powerful interactive interface
Single environment for spreadsheet evolution
Model-instance synchronization
Syntactic restrictions
56. 56
Model-driven SpreadsheetsModel-driven Spreadsheets
● Type-safe Evolution of Spreadsheets,
Jácome Cunha, Joost Visser, Tiago Alves, João
Saraiva, In proceedings of the Fundamental
Approaches to Software Engineering - FASE 2011.
● MDSheet: A Framework for Model-driven
Spreadsheet Engineering, Jácome Cunha, João
Paulo Fernandes, Jorge Mendes and João
Saraiva.34th Internactional Conference on
Software Engineering (ICSE 2012).
57. 57
MDSheet Tool Demo VideoMDSheet Tool Demo Video
http://www.youtube.com/watch?feature=player_detailpage&v=6LNdTdCpV2U
58. 58
Still...Still...
Perante os deputados, Souto Moura relembrou
o dia em que o jornal 24 Horas divulgou a
existência de uma listagem de chamadas de
titulares de altos cargos de Estado, constantes
de disquetes anexas ao processo Casa Pia . 'Foi
um dia que nunca esquecerei, foi terrível,
dramático', sustentou o agora juiz-conselheiro
do Supremo Tribunal de Justiça. Nesse dia (13
de janeiro de 2006), diz ter chamado à PGR os
procuradores João Guerra e as procuradoras
adjuntas Paula Ferraz e Cristina Faleiro. Vistas
as disquetes - com o programa Excel, que
continha um filtro informático que ocultava parte
da informação -, a conclusão foi que não haveria
nada naquele suporte para além das listagens
de Paulo Pedroso. 'Aqui só há números do dr.
Paulo Pedroso' foi a conclusão do visionamento.
tickets.
Diário de Notícias, 10 February 2007
59. 59
This Tutorial: PlanThis Tutorial: Plan
Part II
● Model Evolution
● Bidirectional Evolution
● Empirical Study
60. 60
Still...Still...
http://www.eusprig.org/horror-stories.htm
● Title: Report identifies lack of spreadsheet controls, pressure to
approve
● Source:
http://files.shareholder.com/downloads/ONE/2261602328x0x628656/4c
b574a0-0bf5-4728-9582-625e4519b5ab/Task_Force_Report.pdf
● Organization: JP Morgan
● Region: EU
● Release Date:18 January 2013
● Risk: Lowering estimate of VaR in Basel II models
● Tags: Financial
● Spreadsheet Causes: Logic not reviewed, manual copy/paste
operations
61. 61
I. Model EvolutionI. Model Evolution
FASE 2011. Type-safe Evolution of Spreadsheets, Jácome Cunha, Joost Visser,
Tiago Alves, João Saraiva
VL/HCC 2011. Embedding and Evolution of of Spreadsheet Models in Spreadsheet
Systems, Jácome Cunha, João P. Fernandes, Jorge Mendes, João Saraiva
63. 63
● Suppose now you need to add new information
to the spreadsheet
● For instance, the number of passengers of
each flight
● It would require to do several error-prone tasks
● Add columns, labels, update formulas, etc.
● We can do it automatically!
Why do Spreadsheets Need Evolution?Why do Spreadsheets Need Evolution?
65. 65
Evolution StepsEvolution Steps
● Combinators: defined as helper steps
● Semantic: steps that add information to the
model
● Layout: steps that do not add information to the
model, just change its arrangement
66. 66
Combinator StepsCombinator Steps
● Pull Up All References
– All references must be at the top level
– For instance
A×Bφ becomes (A×B)φ
● Apply After and friends
71. 71
II. Bidirectional EvolutionII. Bidirectional Evolution
ICMT 2011. Bidirectional Transformation of Model-Driven Spreadsheets, Jácome
Cunha, João P. Fernandes, Jorge Mendes, Hugo Pacheco, and João Saraiva
ICSE 2012. MDSheet: A Framework for Model-driven Spreadsheet Engineering,
Jácome Cunha, João P. Fernandes, Jorge Mendes, João Saraiva
72. 72
● Some evolution steps are easier to perform on
the instance
● For instance, to add a column to one of the
repetition blocks
● People felt the need to evolve the data
Why do Spreadsheets Need Evolution, Again?Why do Spreadsheets Need Evolution, Again?
81. 81
● Available at http://ssaapp.di.uminho.pt
● Built out of 7886 LOC:
– 3179 in Haskell, for the evolution and inference
– 980 in Basic, for the embedding
– 2665 in C++, for gluing all components
– 340 in Perl, for compilation and setup
– 722, for makefiles
MDSheetMDSheet
ICSE 2012. MDSheet: A Framework for Model-driven Spreadsheet Engineering,
Jácome Cunha, João P. Fernandes, Jorge Mendes, João Saraiva
83. 83
● 17 student from a MSc course
● 2 different spreadsheets
– Microsoft budget
– Local company responsible for water supply of
Braga, Portugal - agere
Study SettingsStudy Settings
84. 84
● Hypotheses:
(1) In order to perform a given set of tasks,
users spend less time when using model-
driven spreadsheets instead of plain ones.
(2) Spreadsheets developed in the model-
driven environment hold less errors than plain
ones.
Study SettingStudy Setting
95. 95
AcknowledgmentsAcknowledgments
This work is funded by ERDF - European
Regional Development Fund through the
COMPETE Programme (operational programme
for competitiveness) and by National Funds
through the FCT - Fundação para a Ciência e a
Tecnologia (Portuguese Foundation for Science
and Technology) within project FCOMP-01-0124-
FEDER-010048.
Notas do Editor
This generated spreadsheet guides users in introducing correct data The spreadsheet includes mechanisms that guarantee that the spreadsheet data always conforms to the model after an user update
Intended to be used by trained person Professional on Spreadsheet models/ClassSheets Not by instance/data end users
This generated spreadsheet guides users in introducing correct data The spreadsheet includes mechanisms that guarantee that the spreadsheet data always conforms to the model after an user update
Baseado em haskell Integrado no OO Clicanca-se em botoes Espetacular Basic