Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Automatically Inferring ClassSheet Models from Spreadsheets
1. Automatically Inferring
ClassSheet Models from Spreadsheets
Most Influencial Paper Award 10Y
VL/HCC 2019 MIP, MEMPHIS, USA, 14-18 OCTOBER
* ^ Jácome
Cunha
* + João Saraiva
† Martin Erwig
* University of Minho
^ NOVA LINCS
+ HASLab/INESC TEC
† Oregon State
University
jacome@di.uminho.pt
saraiva@di.uminho.pt
erwig@oregonstate.edu
2. Thank You Very Much!!
◦For the nomination
◦And to all that voted on us
2
3. 10 Years Ago…
◦PhD student
◦VL/HCC 2009 (Corvallis)
◦My first paper at VL/HCC
◦Discovery-based edit assistance for spreadsheets.
Jácome Cunha, João Saraiva, and Joost Visser.
3
7. Estimating the numbers of end users and end user programmers,
Christopher Scaffidi, Mary Shaw, and Brad Myers, 2005
Very Widely Used!
9
8. In 2004, RevenueRecognition.com (now
Softtrax) had the International Data Corporation
(IDC) interview 118 business leaders
IDC found that 85% were using spreadsheets in
financial reporting and forecasting
Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R.
Panko and Nicholas Ordway, 2008
What For? Everything! Important Tasks!
10
9. Economy losses of $10 billion/year!
http://www.eusprig.org/horror-stories.htm
11
11. 10 Years Ago…
◦My advisor, João Saraiva, suggested me to visit
Martin Erwig after VL/HCC 2009
◦And start working with him…
13
12. Automatically Inferring ClassSheet Models
from Spreadsheets
16
A ⇀ B
C D ⇀ E
...
are provided through a tool, called Gencel, which is an ex-
tension to MS Excel that restricts the definition and pos-
sible evolution of a concrete spreadsheet according to the
specification given by the template. Thus, illegal update
operations like a partial copying or moving of vertical (vex)
or horizontal (hex) recurring groups are prevented. Further-
more, formulas are updated correctly and automatically. For
example, in the case of inserting new instances of vex or hex
groups in a concrete generated spreadsheet. For details on
the type system and spreadsheet generation, we refer to [12].
Figures 2 and 3 demonstrate a somewhat more involved
example of a V it sl template and a corresponding Gencel
spreadsheet application for a budget calculation. We will
use this application to illustrate some limitations of the
V it sl / Gencel approach as a motivation for the proposed
ClassSheet model.
Figure 2 shows that a hex group has been defined by clus-
tering columns C, D, and E. The underlying problem domain
requirement was that for each year (and for each category)
the values Qnty, Cost and their product form a logical unit
and should occur in the spreadsheet. The only way to ex-
press this logical clustering of three cells within V it sl is by
the omission of layout-oriented notations as the two small
vertical bars in the header row between C, D, and E. In Gen-
cel, this grouping causes the corresponding insertion and
deletion of groups of three columns as blocks.2
Now imagine that the V it sl designer would have grouped
only the cells D and E, which would solely be visible in
V it sl by an additional bar within the header row between
the columns C and D. This notationally minimally differ-
ent V it sl model would have resulted in a completely differ-
ent spreadsheet application, in which the horizontal repeti-
tion would have been restricted to the two columns D and
E. This different grouping would express that the quantity
value is fixed for all years, while only the cost value might
vary yearly.
Another possible source for an error-prone spreadsheet
model is due to the indication of references in formu-
of data values, which are summed up and the sum of which
is shown in a separate cell. From an object-oriented point
of view, one can see a summation object, which aggregrates
a list of objects bearing single data values. Looking at the
layout structure, the list of single value objects, consisting
of a header Item and a list of value objects, is embedded into
the layout of the summation object, consisting of a header
entry Income and a footer with the label Total and an ag-
gregation formula assigned to an attribute named total. We
call such an object-oriented extended template a ClassSheet
since it defines classes together with their attributes and
aggregational relationships.
1
A
Income
2
3
Item
value = 0
4 Total
5 total = SUM(Item.value)
...
Incomeˆ Itemˆ 0↓
ˆ Totalˆ SUM((0, − 2))
Figur e 5: A sim ple one-dim ension al ClassSheet .
Thus, ClassSheets consists of a list of attribute definitions
grouped by classes and are arranged on a two dimensional
grid. Additional labels are used to annotate the concrete
representation. Class names are set in boldface in contrast
to attribute names and labels, which are set in normal face.
In addition, colored borders are used to depict the different
classes within a ClassSheet.3
Class parts may be spread over header and footer en-
tries, which results in a bracket-like structure indicated by a
square-bracket-like notation of (open) class rectangles. For
...
Generate
SS App
Detect
FDs
Infer
ER
Map to
UML
Original SS
Improved SS
Map to
RDB
Map to
other
paradigms
are provided through a tool, called Gencel, which is an ex-
tension to MS Excel that restricts the definition and pos-
sible evolution of a concrete spreadsheet according to the
specification given by the template. Thus, illegal update
operations like a partial copying or moving of vertical (vex)
or horizontal (hex) recurring groups are prevented. Further-
more, formulasareupdated correctly and automatically. For
example, in the case of inserting new instances of vex or hex
groups in a concrete generated spreadsheet. For details on
the type system and spreadsheet generation, we refer to [12].
Figures 2 and 3 demonstrate a somewhat more involved
example of a V it sl template and a corresponding Gencel
spreadsheet application for a budget calculation. We will
use this application to illustrate some limitations of the
V it sl / Gencel approach as a motivation for the proposed
ClassSheet model.
Figure 2 shows that a hex group has been defined by clus-
tering columns C, D, and E. The underlying problem domain
requirement was that for each year (and for each category)
the values Qnty, Cost and their product form a logical unit
and should occur in the spreadsheet. The only way to ex-
press this logical clustering of three cells within V it sl is by
the omission of layout-oriented notations as the two small
vertical bars in the header row between C, D, and E. In Gen-
cel, this grouping causes the corresponding insertion and
deletion of groups of three columns as blocks.2
Now imagine that the V it sl designer would have grouped
only the cells D and E, which would solely be visible in
V it sl by an additional bar within the header row between
the columns C and D. This notationally minimally differ-
ent V it sl model would have resulted in a completely differ-
of data values, which are summed up and the sum of which
is shown in a separate cell. From an object-oriented point
of view, one can see a summation object, which aggregrates
a list of objects bearing single data values. Looking at the
layout structure, the list of single value objects, consisting
of a header Item and a list of value objects, is embedded into
the layout of the summation object, consisting of a header
entry Income and a footer with the label Total and an ag-
gregation formula assigned to an attribute named total. We
call such an object-oriented extended template a ClassSheet
since it defines classes together with their attributes and
aggregational relationships.
total : Int
Income
value : Int = 0
Item
*
SUM(Item.value)
Incomeˆ Itemˆ 0↓
ˆ Totalˆ SUM((0, − 2))
Figur e 5: A sim ple one-dim ension al ClassSheet .
Thus, ClassSheets consists of a list of attribute definitions
grouped by classes and are arranged on a two dimensional
grid. Additional labels are used to annotate the concrete
UML Class
Diagram
FDs
ClassSheet
Relational DB
S(A,B)
T(C,D,E)
...
Map
to
CS
ER
14. Takeaway
◦Do not go into technical details that cannot be followed
by the audience
◦Convince them that is work is wonderful, amazing, the
best paper they will ever read
◦And they will read the paper (eventually)
18
17. We Influenced Many Students
◦Jorge Mendes (MDE, Classification)
◦Pedro Maia (Aspected-oriented programming)
◦Rui Pereira (Querying, Model refactoring)
◦Pedro Martins (Fault localization)
◦Hugo Ribeiro (Smells)
◦Ricardo Moreira (Version control)
◦Ricardo Teixeira (Automatic generation of spreadsheets)
◦Diogo Canteiro (Documentation)
◦André Silva (Automatic evolution)
◦Christophe Peixoto (Quality)
◦ Pedro Faria (Security)
21
18. We Influenced Many People
◦Research projects
◦Several papers published (>40)
◦Not only at VL/HCC, but also ICSE, ASE, FASE, ICMT,
ICSME, TSE, JSS
◦Prototypes
22
19. We Influenced Many other Works
◦ 61 citations on Google Scholar + 21 from journal version
◦ Rule-based spreadsheet data transformation from arbitrary to relational tables
AO Shigarov, AA Mikhailov
◦ A reverse engineering process for inferring data models from spreadsheet-based information systems: an automotive
industrial experience
D Amalfitano, AR Fasolino, P Tramontana et al.
◦ TabbyXL: Software platform for rule-based spreadsheet data extraction and transformation
A Shigarov, V Khristyuk, A Mikhailov
◦ Now You're Thinking With Structures: A Concept for Structure-based Interactions with Spreadsheets
P Koch
◦ Expandable group identification in spreadsheets
W Dou, S Han, L Xu, D Zhang, J Wei
◦ Semi-automatic Extraction of Cross-Table Data from a Set of Spreadsheets
A Swidan, F Hermans
◦ Migrating legacy spreadsheets-based systems to Web MVC architecture: an industrial case study
D Amalfitano, AR Fasolino, V Maggio et al.
23
20. Takeaway
• I was a young, naïve, innocent PhD student
• Visiting Martin was new
• Didn’t even know what to tell him
• But in the end we did an interesting work
• Listen to your advisor; (sometimes) she/he knows better ;)
• Send your students to meet other
researchers/groups/views/countries/cultures (I’m available ;)
25
21. Thanks to All My Co-Authors!
26
João Saraiva - João P. Fernandes - Jorge Mendes - Rui Pereira - Marco Couto -
Joost Visser - Alexandre Perez - Martin Erwig - Tiago Carção - Rui Abreu -
Pedro Martins - Rui Rua - Hugo Pacheco - Orlando Belo - Tiago Alves -
Gregor Engels - Stefan Sauer - Paulo Borba - Joao Costa Seco - Ana Moreira -
Joao Araujo - José María Conejero Manzano - Isabel Sofia Sousa Brito -
Danny Weyns - Uwe Zdun - M Goedicke - Jaime Chavarriaga - Vasco Amaral -
Armanda Rodrigues - Cláudio Ângelo Gonçalves Gomes - Ralf Lämmel -
Vadim Zaytsev - Henrique Rebêlo - Diogo Canteiro - Alcino Cunha -
Nuno Macedo - Jose N. Oliveira - Luis S. Barbosa - Alberto Pardo -
Caitlin Kelleher - Francisco Ribeiro
22. We Didn’t Solve It All
◦ We did many things
◦ But there’s always more to do
◦ E.g. Bas presented an editor for spreadsheets
formulas
27
25. Automatically Inferring
ClassSheet Models from Spreadsheets
Most Influencial Paper Award 10Y
VL/HCC 2019 MIP, MEMPHIS, USA, 14-18 OCTOBER
* ^ Jácome
Cunha
* + João Saraiva
† Martin Erwig
* University of Minho
^ NOVA LINCS
+ HASLab/INESC TEC
† Oregon State
University
jacome@di.uminho.pt
saraiva@di.uminho.pt
erwig@oregonstate.edu