SlideShare uma empresa Scribd logo
1 de 25
Automatically Inferring
ClassSheet Models from Spreadsheets
Most Influencial Paper Award 10Y
VL/HCC 2019 MIP, MEMPHIS, USA, 14-18 OCTOBER
* ^ Jácome
Cunha
* + João Saraiva
† Martin Erwig
* University of Minho
^ NOVA LINCS
+ HASLab/INESC TEC
† Oregon State
University
jacome@di.uminho.pt
saraiva@di.uminho.pt
erwig@oregonstate.edu
Thank You Very Much!!
◦For the nomination
◦And to all that voted on us
2
10 Years Ago…
◦PhD student
◦VL/HCC 2009 (Corvallis)
◦My first paper at VL/HCC
◦Discovery-based edit assistance for spreadsheets.
Jácome Cunha, João Saraiva, and Joost Visser.
3
4
Spreadsheets??!!?!?
Well… 12 Years Ago...
5
Spreadsheets are Widely Used
8
Estimating the numbers of end users and end user programmers,
Christopher Scaffidi, Mary Shaw, and Brad Myers, 2005
Very Widely Used!
9
In 2004, RevenueRecognition.com (now
Softtrax) had the International Data Corporation
(IDC) interview 118 business leaders
IDC found that 85% were using spreadsheets in
financial reporting and forecasting
Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R.
Panko and Nicholas Ordway, 2008
What For? Everything! Important Tasks!
10
Economy losses of $10 billion/year!
http://www.eusprig.org/horror-stories.htm
11
Lets do a PhD on spreadsheets 
12
10 Years Ago…
◦My advisor, João Saraiva, suggested me to visit
Martin Erwig after VL/HCC 2009
◦And start working with him…
13
Automatically Inferring ClassSheet Models
from Spreadsheets
16
A ⇀ B
C D ⇀ E
...
are provided through a tool, called Gencel, which is an ex-
tension to MS Excel that restricts the definition and pos-
sible evolution of a concrete spreadsheet according to the
specification given by the template. Thus, illegal update
operations like a partial copying or moving of vertical (vex)
or horizontal (hex) recurring groups are prevented. Further-
more, formulas are updated correctly and automatically. For
example, in the case of inserting new instances of vex or hex
groups in a concrete generated spreadsheet. For details on
the type system and spreadsheet generation, we refer to [12].
Figures 2 and 3 demonstrate a somewhat more involved
example of a V it sl template and a corresponding Gencel
spreadsheet application for a budget calculation. We will
use this application to illustrate some limitations of the
V it sl / Gencel approach as a motivation for the proposed
ClassSheet model.
Figure 2 shows that a hex group has been defined by clus-
tering columns C, D, and E. The underlying problem domain
requirement was that for each year (and for each category)
the values Qnty, Cost and their product form a logical unit
and should occur in the spreadsheet. The only way to ex-
press this logical clustering of three cells within V it sl is by
the omission of layout-oriented notations as the two small
vertical bars in the header row between C, D, and E. In Gen-
cel, this grouping causes the corresponding insertion and
deletion of groups of three columns as blocks.2
Now imagine that the V it sl designer would have grouped
only the cells D and E, which would solely be visible in
V it sl by an additional bar within the header row between
the columns C and D. This notationally minimally differ-
ent V it sl model would have resulted in a completely differ-
ent spreadsheet application, in which the horizontal repeti-
tion would have been restricted to the two columns D and
E. This different grouping would express that the quantity
value is fixed for all years, while only the cost value might
vary yearly.
Another possible source for an error-prone spreadsheet
model is due to the indication of references in formu-
of data values, which are summed up and the sum of which
is shown in a separate cell. From an object-oriented point
of view, one can see a summation object, which aggregrates
a list of objects bearing single data values. Looking at the
layout structure, the list of single value objects, consisting
of a header Item and a list of value objects, is embedded into
the layout of the summation object, consisting of a header
entry Income and a footer with the label Total and an ag-
gregation formula assigned to an attribute named total. We
call such an object-oriented extended template a ClassSheet
since it defines classes together with their attributes and
aggregational relationships.
1
A
Income
2
3
Item
value = 0
4 Total
5 total = SUM(Item.value)
...
Incomeˆ Itemˆ 0↓
ˆ Totalˆ SUM((0, − 2))
Figur e 5: A sim ple one-dim ension al ClassSheet .
Thus, ClassSheets consists of a list of attribute definitions
grouped by classes and are arranged on a two dimensional
grid. Additional labels are used to annotate the concrete
representation. Class names are set in boldface in contrast
to attribute names and labels, which are set in normal face.
In addition, colored borders are used to depict the different
classes within a ClassSheet.3
Class parts may be spread over header and footer en-
tries, which results in a bracket-like structure indicated by a
square-bracket-like notation of (open) class rectangles. For
...
Generate
SS App
Detect
FDs
Infer
ER
Map to
UML
Original SS
Improved SS
Map to
RDB
Map to
other
paradigms
are provided through a tool, called Gencel, which is an ex-
tension to MS Excel that restricts the definition and pos-
sible evolution of a concrete spreadsheet according to the
specification given by the template. Thus, illegal update
operations like a partial copying or moving of vertical (vex)
or horizontal (hex) recurring groups are prevented. Further-
more, formulasareupdated correctly and automatically. For
example, in the case of inserting new instances of vex or hex
groups in a concrete generated spreadsheet. For details on
the type system and spreadsheet generation, we refer to [12].
Figures 2 and 3 demonstrate a somewhat more involved
example of a V it sl template and a corresponding Gencel
spreadsheet application for a budget calculation. We will
use this application to illustrate some limitations of the
V it sl / Gencel approach as a motivation for the proposed
ClassSheet model.
Figure 2 shows that a hex group has been defined by clus-
tering columns C, D, and E. The underlying problem domain
requirement was that for each year (and for each category)
the values Qnty, Cost and their product form a logical unit
and should occur in the spreadsheet. The only way to ex-
press this logical clustering of three cells within V it sl is by
the omission of layout-oriented notations as the two small
vertical bars in the header row between C, D, and E. In Gen-
cel, this grouping causes the corresponding insertion and
deletion of groups of three columns as blocks.2
Now imagine that the V it sl designer would have grouped
only the cells D and E, which would solely be visible in
V it sl by an additional bar within the header row between
the columns C and D. This notationally minimally differ-
ent V it sl model would have resulted in a completely differ-
of data values, which are summed up and the sum of which
is shown in a separate cell. From an object-oriented point
of view, one can see a summation object, which aggregrates
a list of objects bearing single data values. Looking at the
layout structure, the list of single value objects, consisting
of a header Item and a list of value objects, is embedded into
the layout of the summation object, consisting of a header
entry Income and a footer with the label Total and an ag-
gregation formula assigned to an attribute named total. We
call such an object-oriented extended template a ClassSheet
since it defines classes together with their attributes and
aggregational relationships.
total : Int
Income
value : Int = 0
Item
*
SUM(Item.value)
Incomeˆ Itemˆ 0↓
ˆ Totalˆ SUM((0, − 2))
Figur e 5: A sim ple one-dim ension al ClassSheet .
Thus, ClassSheets consists of a list of attribute definitions
grouped by classes and are arranged on a two dimensional
grid. Additional labels are used to annotate the concrete
UML Class
Diagram
FDs
ClassSheet
Relational DB
S(A,B)
T(C,D,E)
...
Map
to
CS
ER
A Slide from the Original Talk
17
Takeaway
◦Do not go into technical details that cannot be followed
by the audience
◦Convince them that is work is wonderful, amazing, the
best paper they will ever read
◦And they will read the paper (eventually)
18
19
We Were Influenced
20
We Influenced Many Students
◦Jorge Mendes (MDE, Classification)
◦Pedro Maia (Aspected-oriented programming)
◦Rui Pereira (Querying, Model refactoring)
◦Pedro Martins (Fault localization)
◦Hugo Ribeiro (Smells)
◦Ricardo Moreira (Version control)
◦Ricardo Teixeira (Automatic generation of spreadsheets)
◦Diogo Canteiro (Documentation)
◦André Silva (Automatic evolution)
◦Christophe Peixoto (Quality)
◦ Pedro Faria (Security)
21
We Influenced Many People
◦Research projects
◦Several papers published (>40)
◦Not only at VL/HCC, but also ICSE, ASE, FASE, ICMT,
ICSME, TSE, JSS
◦Prototypes
22
We Influenced Many other Works
◦ 61 citations on Google Scholar + 21 from journal version
◦ Rule-based spreadsheet data transformation from arbitrary to relational tables
AO Shigarov, AA Mikhailov
◦ A reverse engineering process for inferring data models from spreadsheet-based information systems: an automotive
industrial experience
D Amalfitano, AR Fasolino, P Tramontana et al.
◦ TabbyXL: Software platform for rule-based spreadsheet data extraction and transformation
A Shigarov, V Khristyuk, A Mikhailov
◦ Now You're Thinking With Structures: A Concept for Structure-based Interactions with Spreadsheets
P Koch
◦ Expandable group identification in spreadsheets
W Dou, S Han, L Xu, D Zhang, J Wei
◦ Semi-automatic Extraction of Cross-Table Data from a Set of Spreadsheets
A Swidan, F Hermans
◦ Migrating legacy spreadsheets-based systems to Web MVC architecture: an industrial case study
D Amalfitano, AR Fasolino, V Maggio et al.
23
Takeaway
• I was a young, naïve, innocent PhD student
• Visiting Martin was new
• Didn’t even know what to tell him
• But in the end we did an interesting work
• Listen to your advisor; (sometimes) she/he knows better ;)
• Send your students to meet other
researchers/groups/views/countries/cultures (I’m available ;)
25
Thanks to All My Co-Authors!
26
João Saraiva - João P. Fernandes - Jorge Mendes - Rui Pereira - Marco Couto -
Joost Visser - Alexandre Perez - Martin Erwig - Tiago Carção - Rui Abreu -
Pedro Martins - Rui Rua - Hugo Pacheco - Orlando Belo - Tiago Alves -
Gregor Engels - Stefan Sauer - Paulo Borba - Joao Costa Seco - Ana Moreira -
Joao Araujo - José María Conejero Manzano - Isabel Sofia Sousa Brito -
Danny Weyns - Uwe Zdun - M Goedicke - Jaime Chavarriaga - Vasco Amaral -
Armanda Rodrigues - Cláudio Ângelo Gonçalves Gomes - Ralf Lämmel -
Vadim Zaytsev - Henrique Rebêlo - Diogo Canteiro - Alcino Cunha -
Nuno Macedo - Jose N. Oliveira - Luis S. Barbosa - Alberto Pardo -
Caitlin Kelleher - Francisco Ribeiro
We Didn’t Solve It All
◦ We did many things
◦ But there’s always more to do
◦ E.g. Bas presented an editor for spreadsheets
formulas
27
28
It’s your turn now!
Thank you!
Questions?
30
Automatically Inferring
ClassSheet Models from Spreadsheets
Most Influencial Paper Award 10Y
VL/HCC 2019 MIP, MEMPHIS, USA, 14-18 OCTOBER
* ^ Jácome
Cunha
* + João Saraiva
† Martin Erwig
* University of Minho
^ NOVA LINCS
+ HASLab/INESC TEC
† Oregon State
University
jacome@di.uminho.pt
saraiva@di.uminho.pt
erwig@oregonstate.edu

Mais conteúdo relacionado

Mais procurados

D I T211 Chapter 6
D I T211    Chapter 6D I T211    Chapter 6
D I T211 Chapter 6
askme
 

Mais procurados (19)

Microsoft excel
Microsoft excelMicrosoft excel
Microsoft excel
 
Excel 2007 Unit J
Excel 2007 Unit JExcel 2007 Unit J
Excel 2007 Unit J
 
Excel 2007 Unit G
Excel 2007 Unit GExcel 2007 Unit G
Excel 2007 Unit G
 
Office excel tips and tricks 201101
Office excel tips and tricks 201101Office excel tips and tricks 201101
Office excel tips and tricks 201101
 
Vba primer
Vba primerVba primer
Vba primer
 
Histogram
HistogramHistogram
Histogram
 
Lo T5 M7
Lo T5 M7Lo T5 M7
Lo T5 M7
 
BIS 155 Redefined Education--bis155.com
BIS 155 Redefined Education--bis155.comBIS 155 Redefined Education--bis155.com
BIS 155 Redefined Education--bis155.com
 
BIS 155 Education for Service--bis155.com
BIS 155 Education for Service--bis155.comBIS 155 Education for Service--bis155.com
BIS 155 Education for Service--bis155.com
 
BIS 155 Inspiring Innovation -- bis155.com
BIS 155 Inspiring Innovation -- bis155.comBIS 155 Inspiring Innovation -- bis155.com
BIS 155 Inspiring Innovation -- bis155.com
 
BIS 155 Lessons in Excellence / bis155.com
BIS 155 Lessons in Excellence / bis155.comBIS 155 Lessons in Excellence / bis155.com
BIS 155 Lessons in Excellence / bis155.com
 
BIS 155 PAPERS Education for Service--bis155papers.com
BIS 155 PAPERS Education for Service--bis155papers.comBIS 155 PAPERS Education for Service--bis155papers.com
BIS 155 PAPERS Education for Service--bis155papers.com
 
BIS 155 PAPERS Introduction Education--bis155papers.com
BIS 155 PAPERS Introduction Education--bis155papers.comBIS 155 PAPERS Introduction Education--bis155papers.com
BIS 155 PAPERS Introduction Education--bis155papers.com
 
BIS 155 PAPERS Become Exceptional--bis155papers.com
BIS 155 PAPERS Become Exceptional--bis155papers.comBIS 155 PAPERS Become Exceptional--bis155papers.com
BIS 155 PAPERS Become Exceptional--bis155papers.com
 
BIS 155 PAPERS Lessons in Excellence--bis155papers.com
BIS 155 PAPERS Lessons in Excellence--bis155papers.comBIS 155 PAPERS Lessons in Excellence--bis155papers.com
BIS 155 PAPERS Lessons in Excellence--bis155papers.com
 
BIS 155 PAPERS Redefined Education--bis155papers.com
BIS 155 PAPERS Redefined Education--bis155papers.comBIS 155 PAPERS Redefined Education--bis155papers.com
BIS 155 PAPERS Redefined Education--bis155papers.com
 
D I T211 Chapter 6
D I T211    Chapter 6D I T211    Chapter 6
D I T211 Chapter 6
 
Sq lite module7
Sq lite module7Sq lite module7
Sq lite module7
 
Excel.t04
Excel.t04Excel.t04
Excel.t04
 

Semelhante a Automatically Inferring ClassSheet Models from Spreadsheets

SSIS_SSRS_PPS_SP_SSAS_Hong_Bing Li
SSIS_SSRS_PPS_SP_SSAS_Hong_Bing LiSSIS_SSRS_PPS_SP_SSAS_Hong_Bing Li
SSIS_SSRS_PPS_SP_SSAS_Hong_Bing Li
Hong-Bing Li
 
SSIS_SSAS_SSRS_SP_PPS_HongBingLi
SSIS_SSAS_SSRS_SP_PPS_HongBingLiSSIS_SSAS_SSRS_SP_PPS_HongBingLi
SSIS_SSAS_SSRS_SP_PPS_HongBingLi
Hong-Bing Li
 
MemorandumDesign4Practice (D4P) ProgramTo EGR 186 Students.docx
MemorandumDesign4Practice (D4P) ProgramTo EGR 186 Students.docxMemorandumDesign4Practice (D4P) ProgramTo EGR 186 Students.docx
MemorandumDesign4Practice (D4P) ProgramTo EGR 186 Students.docx
andreecapon
 
Ssis Ssas Ssrs Sp Pps Hong Bing Li
Ssis Ssas Ssrs Sp Pps Hong Bing LiSsis Ssas Ssrs Sp Pps Hong Bing Li
Ssis Ssas Ssrs Sp Pps Hong Bing Li
Hong-Bing Li
 
Mapping inheritance structures_mapping_class
Mapping inheritance structures_mapping_classMapping inheritance structures_mapping_class
Mapping inheritance structures_mapping_class
Todor Kolev
 

Semelhante a Automatically Inferring ClassSheet Models from Spreadsheets (20)

Application of Graph Theory in Computer science using Data Structure.pdf
Application of Graph Theory in Computer science using Data Structure.pdfApplication of Graph Theory in Computer science using Data Structure.pdf
Application of Graph Theory in Computer science using Data Structure.pdf
 
Computer Science Assignment Help
Computer Science Assignment Help Computer Science Assignment Help
Computer Science Assignment Help
 
Building data fusion surrogate models for spacecraft aerodynamic problems wit...
Building data fusion surrogate models for spacecraft aerodynamic problems wit...Building data fusion surrogate models for spacecraft aerodynamic problems wit...
Building data fusion surrogate models for spacecraft aerodynamic problems wit...
 
Lab3cth
Lab3cthLab3cth
Lab3cth
 
theory of computation lecture 01
theory of computation lecture 01theory of computation lecture 01
theory of computation lecture 01
 
Connected charts explicit visualization of relationship between data graphics
Connected charts explicit visualization of relationship between data graphicsConnected charts explicit visualization of relationship between data graphics
Connected charts explicit visualization of relationship between data graphics
 
SSIS_SSRS_PPS_SP_SSAS_Hong_Bing Li
SSIS_SSRS_PPS_SP_SSAS_Hong_Bing LiSSIS_SSRS_PPS_SP_SSAS_Hong_Bing Li
SSIS_SSRS_PPS_SP_SSAS_Hong_Bing Li
 
A New Structural Similarity Measure: Clustering of Multi-Structured Documents
A New Structural Similarity Measure: Clustering of Multi-Structured DocumentsA New Structural Similarity Measure: Clustering of Multi-Structured Documents
A New Structural Similarity Measure: Clustering of Multi-Structured Documents
 
SSIS_SSAS_SSRS_SP_PPS_HongBingLi
SSIS_SSAS_SSRS_SP_PPS_HongBingLiSSIS_SSAS_SSRS_SP_PPS_HongBingLi
SSIS_SSAS_SSRS_SP_PPS_HongBingLi
 
Preparing for BIT – IT2301 Database Management Systems 2001f
Preparing for BIT – IT2301 Database Management Systems 2001fPreparing for BIT – IT2301 Database Management Systems 2001f
Preparing for BIT – IT2301 Database Management Systems 2001f
 
MemorandumDesign4Practice (D4P) ProgramTo EGR 186 Students.docx
MemorandumDesign4Practice (D4P) ProgramTo EGR 186 Students.docxMemorandumDesign4Practice (D4P) ProgramTo EGR 186 Students.docx
MemorandumDesign4Practice (D4P) ProgramTo EGR 186 Students.docx
 
Lo T5 M7
Lo T5 M7Lo T5 M7
Lo T5 M7
 
Ssis Ssas Ssrs Sp Pps Hong Bing Li
Ssis Ssas Ssrs Sp Pps Hong Bing LiSsis Ssas Ssrs Sp Pps Hong Bing Li
Ssis Ssas Ssrs Sp Pps Hong Bing Li
 
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table packageJanuary 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
 
Data structures and algorithms short note (version 14).pd
Data structures and algorithms short note (version 14).pdData structures and algorithms short note (version 14).pd
Data structures and algorithms short note (version 14).pd
 
Mapping inheritance structures_mapping_class
Mapping inheritance structures_mapping_classMapping inheritance structures_mapping_class
Mapping inheritance structures_mapping_class
 
Excel analysis assignment this is an independent assignment me
Excel analysis assignment this is an independent assignment meExcel analysis assignment this is an independent assignment me
Excel analysis assignment this is an independent assignment me
 
Relaxing global-as-view in mediated data integration from linked data
Relaxing global-as-view in mediated data integration from linked dataRelaxing global-as-view in mediated data integration from linked data
Relaxing global-as-view in mediated data integration from linked data
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
Introduction to Relational Database Management Systems
Introduction to Relational Database Management SystemsIntroduction to Relational Database Management Systems
Introduction to Relational Database Management Systems
 

Mais de Jácome Cunha

Type-Safe Evolution of 
Web Services
Type-Safe Evolution of 
Web ServicesType-Safe Evolution of 
Web Services
Type-Safe Evolution of 
Web Services
Jácome Cunha
 
Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14
Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14
Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14
Jácome Cunha
 
Talk at the Joint SSaaPP/FATBIT 2012 Workshop
Talk at the Joint SSaaPP/FATBIT 2012 WorkshopTalk at the Joint SSaaPP/FATBIT 2012 Workshop
Talk at the Joint SSaaPP/FATBIT 2012 Workshop
Jácome Cunha
 
Talk SAC '12 - SE Track
Talk SAC '12 - SE TrackTalk SAC '12 - SE Track
Talk SAC '12 - SE Track
Jácome Cunha
 

Mais de Jácome Cunha (20)

Spreadsheet Engineering
Spreadsheet EngineeringSpreadsheet Engineering
Spreadsheet Engineering
 
Model-driven Spreadsheets
Model-driven SpreadsheetsModel-driven Spreadsheets
Model-driven Spreadsheets
 
Model-Driven Spreadsheet Development
Model-Driven Spreadsheet DevelopmentModel-Driven Spreadsheet Development
Model-Driven Spreadsheet Development
 
Energy Efficiency Across 
Programming Languages
Energy Efficiency Across 
Programming LanguagesEnergy Efficiency Across 
Programming Languages
Energy Efficiency Across 
Programming Languages
 
Explaining Spreadsheets with Spreadsheets
Explaining Spreadsheets with SpreadsheetsExplaining Spreadsheets with Spreadsheets
Explaining Spreadsheets with Spreadsheets
 
On Understanding Data Scientists
On Understanding  Data ScientistsOn Understanding  Data Scientists
On Understanding Data Scientists
 
Systematic Spreadsheet Construction Processes @ VL/HCC 2017
Systematic Spreadsheet Construction Processes @ VL/HCC 2017Systematic Spreadsheet Construction Processes @ VL/HCC 2017
Systematic Spreadsheet Construction Processes @ VL/HCC 2017
 
jStanley: Placing a Green Thumb on Java Collections
jStanley: Placing a Green Thumb on  Java CollectionsjStanley: Placing a Green Thumb on  Java Collections
jStanley: Placing a Green Thumb on Java Collections
 
Type-Safe Evolution of 
Web Services
Type-Safe Evolution of 
Web ServicesType-Safe Evolution of 
Web Services
Type-Safe Evolution of 
Web Services
 
MDSheet – Model-Driven Spreadsheets
MDSheet – Model-Driven SpreadsheetsMDSheet – Model-Driven Spreadsheets
MDSheet – Model-Driven Spreadsheets
 
Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14
Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14
Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14
 
Summer School DSL 2013 - SpreadSheet Engineering
Summer School DSL 2013 - SpreadSheet EngineeringSummer School DSL 2013 - SpreadSheet Engineering
Summer School DSL 2013 - SpreadSheet Engineering
 
Talk at VL/HCC '12
Talk at VL/HCC '12Talk at VL/HCC '12
Talk at VL/HCC '12
 
Talk at QUATIC '12
Talk at QUATIC '12Talk at QUATIC '12
Talk at QUATIC '12
 
Talk at the Joint SSaaPP/FATBIT 2012 Workshop
Talk at the Joint SSaaPP/FATBIT 2012 WorkshopTalk at the Joint SSaaPP/FATBIT 2012 Workshop
Talk at the Joint SSaaPP/FATBIT 2012 Workshop
 
Talk
TalkTalk
Talk
 
Talk at IS-EUD '11
Talk at IS-EUD '11Talk at IS-EUD '11
Talk at IS-EUD '11
 
Talk at EUSPRIG '11
Talk at EUSPRIG '11Talk at EUSPRIG '11
Talk at EUSPRIG '11
 
Talk at VL/HCC '11
Talk at VL/HCC '11Talk at VL/HCC '11
Talk at VL/HCC '11
 
Talk SAC '12 - SE Track
Talk SAC '12 - SE TrackTalk SAC '12 - SE Track
Talk SAC '12 - SE Track
 

Último

CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
anilsa9823
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
University of Hertfordshire
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 

Último (20)

CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 

Automatically Inferring ClassSheet Models from Spreadsheets

  • 1. Automatically Inferring ClassSheet Models from Spreadsheets Most Influencial Paper Award 10Y VL/HCC 2019 MIP, MEMPHIS, USA, 14-18 OCTOBER * ^ Jácome Cunha * + João Saraiva † Martin Erwig * University of Minho ^ NOVA LINCS + HASLab/INESC TEC † Oregon State University jacome@di.uminho.pt saraiva@di.uminho.pt erwig@oregonstate.edu
  • 2. Thank You Very Much!! ◦For the nomination ◦And to all that voted on us 2
  • 3. 10 Years Ago… ◦PhD student ◦VL/HCC 2009 (Corvallis) ◦My first paper at VL/HCC ◦Discovery-based edit assistance for spreadsheets. Jácome Cunha, João Saraiva, and Joost Visser. 3
  • 5. Well… 12 Years Ago... 5
  • 7. Estimating the numbers of end users and end user programmers, Christopher Scaffidi, Mary Shaw, and Brad Myers, 2005 Very Widely Used! 9
  • 8. In 2004, RevenueRecognition.com (now Softtrax) had the International Data Corporation (IDC) interview 118 business leaders IDC found that 85% were using spreadsheets in financial reporting and forecasting Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R. Panko and Nicholas Ordway, 2008 What For? Everything! Important Tasks! 10
  • 9. Economy losses of $10 billion/year! http://www.eusprig.org/horror-stories.htm 11
  • 10. Lets do a PhD on spreadsheets  12
  • 11. 10 Years Ago… ◦My advisor, João Saraiva, suggested me to visit Martin Erwig after VL/HCC 2009 ◦And start working with him… 13
  • 12. Automatically Inferring ClassSheet Models from Spreadsheets 16 A ⇀ B C D ⇀ E ... are provided through a tool, called Gencel, which is an ex- tension to MS Excel that restricts the definition and pos- sible evolution of a concrete spreadsheet according to the specification given by the template. Thus, illegal update operations like a partial copying or moving of vertical (vex) or horizontal (hex) recurring groups are prevented. Further- more, formulas are updated correctly and automatically. For example, in the case of inserting new instances of vex or hex groups in a concrete generated spreadsheet. For details on the type system and spreadsheet generation, we refer to [12]. Figures 2 and 3 demonstrate a somewhat more involved example of a V it sl template and a corresponding Gencel spreadsheet application for a budget calculation. We will use this application to illustrate some limitations of the V it sl / Gencel approach as a motivation for the proposed ClassSheet model. Figure 2 shows that a hex group has been defined by clus- tering columns C, D, and E. The underlying problem domain requirement was that for each year (and for each category) the values Qnty, Cost and their product form a logical unit and should occur in the spreadsheet. The only way to ex- press this logical clustering of three cells within V it sl is by the omission of layout-oriented notations as the two small vertical bars in the header row between C, D, and E. In Gen- cel, this grouping causes the corresponding insertion and deletion of groups of three columns as blocks.2 Now imagine that the V it sl designer would have grouped only the cells D and E, which would solely be visible in V it sl by an additional bar within the header row between the columns C and D. This notationally minimally differ- ent V it sl model would have resulted in a completely differ- ent spreadsheet application, in which the horizontal repeti- tion would have been restricted to the two columns D and E. This different grouping would express that the quantity value is fixed for all years, while only the cost value might vary yearly. Another possible source for an error-prone spreadsheet model is due to the indication of references in formu- of data values, which are summed up and the sum of which is shown in a separate cell. From an object-oriented point of view, one can see a summation object, which aggregrates a list of objects bearing single data values. Looking at the layout structure, the list of single value objects, consisting of a header Item and a list of value objects, is embedded into the layout of the summation object, consisting of a header entry Income and a footer with the label Total and an ag- gregation formula assigned to an attribute named total. We call such an object-oriented extended template a ClassSheet since it defines classes together with their attributes and aggregational relationships. 1 A Income 2 3 Item value = 0 4 Total 5 total = SUM(Item.value) ... Incomeˆ Itemˆ 0↓ ˆ Totalˆ SUM((0, − 2)) Figur e 5: A sim ple one-dim ension al ClassSheet . Thus, ClassSheets consists of a list of attribute definitions grouped by classes and are arranged on a two dimensional grid. Additional labels are used to annotate the concrete representation. Class names are set in boldface in contrast to attribute names and labels, which are set in normal face. In addition, colored borders are used to depict the different classes within a ClassSheet.3 Class parts may be spread over header and footer en- tries, which results in a bracket-like structure indicated by a square-bracket-like notation of (open) class rectangles. For ... Generate SS App Detect FDs Infer ER Map to UML Original SS Improved SS Map to RDB Map to other paradigms are provided through a tool, called Gencel, which is an ex- tension to MS Excel that restricts the definition and pos- sible evolution of a concrete spreadsheet according to the specification given by the template. Thus, illegal update operations like a partial copying or moving of vertical (vex) or horizontal (hex) recurring groups are prevented. Further- more, formulasareupdated correctly and automatically. For example, in the case of inserting new instances of vex or hex groups in a concrete generated spreadsheet. For details on the type system and spreadsheet generation, we refer to [12]. Figures 2 and 3 demonstrate a somewhat more involved example of a V it sl template and a corresponding Gencel spreadsheet application for a budget calculation. We will use this application to illustrate some limitations of the V it sl / Gencel approach as a motivation for the proposed ClassSheet model. Figure 2 shows that a hex group has been defined by clus- tering columns C, D, and E. The underlying problem domain requirement was that for each year (and for each category) the values Qnty, Cost and their product form a logical unit and should occur in the spreadsheet. The only way to ex- press this logical clustering of three cells within V it sl is by the omission of layout-oriented notations as the two small vertical bars in the header row between C, D, and E. In Gen- cel, this grouping causes the corresponding insertion and deletion of groups of three columns as blocks.2 Now imagine that the V it sl designer would have grouped only the cells D and E, which would solely be visible in V it sl by an additional bar within the header row between the columns C and D. This notationally minimally differ- ent V it sl model would have resulted in a completely differ- of data values, which are summed up and the sum of which is shown in a separate cell. From an object-oriented point of view, one can see a summation object, which aggregrates a list of objects bearing single data values. Looking at the layout structure, the list of single value objects, consisting of a header Item and a list of value objects, is embedded into the layout of the summation object, consisting of a header entry Income and a footer with the label Total and an ag- gregation formula assigned to an attribute named total. We call such an object-oriented extended template a ClassSheet since it defines classes together with their attributes and aggregational relationships. total : Int Income value : Int = 0 Item * SUM(Item.value) Incomeˆ Itemˆ 0↓ ˆ Totalˆ SUM((0, − 2)) Figur e 5: A sim ple one-dim ension al ClassSheet . Thus, ClassSheets consists of a list of attribute definitions grouped by classes and are arranged on a two dimensional grid. Additional labels are used to annotate the concrete UML Class Diagram FDs ClassSheet Relational DB S(A,B) T(C,D,E) ... Map to CS ER
  • 13. A Slide from the Original Talk 17
  • 14. Takeaway ◦Do not go into technical details that cannot be followed by the audience ◦Convince them that is work is wonderful, amazing, the best paper they will ever read ◦And they will read the paper (eventually) 18
  • 16. 20
  • 17. We Influenced Many Students ◦Jorge Mendes (MDE, Classification) ◦Pedro Maia (Aspected-oriented programming) ◦Rui Pereira (Querying, Model refactoring) ◦Pedro Martins (Fault localization) ◦Hugo Ribeiro (Smells) ◦Ricardo Moreira (Version control) ◦Ricardo Teixeira (Automatic generation of spreadsheets) ◦Diogo Canteiro (Documentation) ◦André Silva (Automatic evolution) ◦Christophe Peixoto (Quality) ◦ Pedro Faria (Security) 21
  • 18. We Influenced Many People ◦Research projects ◦Several papers published (>40) ◦Not only at VL/HCC, but also ICSE, ASE, FASE, ICMT, ICSME, TSE, JSS ◦Prototypes 22
  • 19. We Influenced Many other Works ◦ 61 citations on Google Scholar + 21 from journal version ◦ Rule-based spreadsheet data transformation from arbitrary to relational tables AO Shigarov, AA Mikhailov ◦ A reverse engineering process for inferring data models from spreadsheet-based information systems: an automotive industrial experience D Amalfitano, AR Fasolino, P Tramontana et al. ◦ TabbyXL: Software platform for rule-based spreadsheet data extraction and transformation A Shigarov, V Khristyuk, A Mikhailov ◦ Now You're Thinking With Structures: A Concept for Structure-based Interactions with Spreadsheets P Koch ◦ Expandable group identification in spreadsheets W Dou, S Han, L Xu, D Zhang, J Wei ◦ Semi-automatic Extraction of Cross-Table Data from a Set of Spreadsheets A Swidan, F Hermans ◦ Migrating legacy spreadsheets-based systems to Web MVC architecture: an industrial case study D Amalfitano, AR Fasolino, V Maggio et al. 23
  • 20. Takeaway • I was a young, naïve, innocent PhD student • Visiting Martin was new • Didn’t even know what to tell him • But in the end we did an interesting work • Listen to your advisor; (sometimes) she/he knows better ;) • Send your students to meet other researchers/groups/views/countries/cultures (I’m available ;) 25
  • 21. Thanks to All My Co-Authors! 26 João Saraiva - João P. Fernandes - Jorge Mendes - Rui Pereira - Marco Couto - Joost Visser - Alexandre Perez - Martin Erwig - Tiago Carção - Rui Abreu - Pedro Martins - Rui Rua - Hugo Pacheco - Orlando Belo - Tiago Alves - Gregor Engels - Stefan Sauer - Paulo Borba - Joao Costa Seco - Ana Moreira - Joao Araujo - José María Conejero Manzano - Isabel Sofia Sousa Brito - Danny Weyns - Uwe Zdun - M Goedicke - Jaime Chavarriaga - Vasco Amaral - Armanda Rodrigues - Cláudio Ângelo Gonçalves Gomes - Ralf Lämmel - Vadim Zaytsev - Henrique Rebêlo - Diogo Canteiro - Alcino Cunha - Nuno Macedo - Jose N. Oliveira - Luis S. Barbosa - Alberto Pardo - Caitlin Kelleher - Francisco Ribeiro
  • 22. We Didn’t Solve It All ◦ We did many things ◦ But there’s always more to do ◦ E.g. Bas presented an editor for spreadsheets formulas 27
  • 25. Automatically Inferring ClassSheet Models from Spreadsheets Most Influencial Paper Award 10Y VL/HCC 2019 MIP, MEMPHIS, USA, 14-18 OCTOBER * ^ Jácome Cunha * + João Saraiva † Martin Erwig * University of Minho ^ NOVA LINCS + HASLab/INESC TEC † Oregon State University jacome@di.uminho.pt saraiva@di.uminho.pt erwig@oregonstate.edu