Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Identifying semantics characteristics of user’s interactions datasets through an application of a data analysis
1. Identifying semantics characteristics of user’s
interactions datasets through an application of a data
analysis
Fernando de Assis RODRIGUES, Ph. D.
Pedro Henrique Santos BISI
Ricardo César Gonçalves SANT’ANA, Ph. D.
Graduate Program of Information Science
UNESP - São Paulo State University (Brazil)
2. 2
Using data as part of
the decision-making
process is a reality in
professional
environments.
We need to
decide
[something].
Sure! Let’s use
data to support.
Data Data Data Data Data Data
3. The analyzed fact need to receive inputs from
multiple data sources – structuring, integrating,
storing, and processing the collected data into an
output that supports a better understanding of the
fact from data, allowing new dimensions or
perspectives of analysis
3
4. 4
The use of data as part of the decision-making process in several areas.
Science
Decision-
mankingData
D D D D D D
Education
Decision-
mankingData
D D D D D D
Industry
Decision-
mankingData
D D D D D D
Management
Decision-
mankingData
D D D D D D
Services
Decision-
mankingData
D D D D D D
...
Decision-
mankingData
D D D D D D
6. The <entity, attribute, and value> condition implies
a use of aggregated information on these elements to
assure a minimal semantic to understand what is
available, notably related with steps of obtaining data
collections from data sources
6
7. Goal
7
To identify the semantics characteristics of data attributes
at the moment of collecting, from dataset's structures found
on data export interfaces on user’s interactions analysis tools,
on Internet communication channels, and on web analytics
data tools involved in a scientific journal management,
through an application of a process of data analysis and data
modeling techniques.
8. Methodology and Method
8
About the observed phenomenon
Investigation of datasets, entities, and attributes related to
the interaction between users and communications
channels from a scientific journal.
Nature
Qualitative-quantitative research.
Purpose
Exploratory analysis to identify characteristics of how data
are available and structured on these data resources.
9. The Path adopted on investigation
An exploratory research of data export interfaces to collect
information about available data and metadata.
About methods
(i) Data extraction and spreadsheet data handling with
Python 3 programming language.
(ii) Applying of the Entity-Relationship model in
generated data from analytics.
(iii) Use of data structures oriented to making-decision
processing (OLAP).
Methodology and Method
9
11. Data Sample
User’s interactions data* from RECoDAF - Electronic
Journal Digital Skills for Family Farming
→ Internal data sources: Open Journal Systems
→ External data sources: Facebook Insights, Google
Analytics, Google Search Console, and Twitter Analytics
Methodology and Method
11
* Data collected on September 2017. Available at
http://dadosabertos.info/data/collection_recodaf_2017
13. Results | Discussion
13
Find on data sources
information about:
→Services
→Resources
→Datasets
→Attributes
→File Formats
→Data types
Diagram of Entity-Relationship model
developed for data collecting
Data warehouse
Dat
a
Dat
aData
14. Results | Discussion
14
An entity (ex) may have two attributes (ax and ay) sharing
same semantics (S), even when both attributes shows distinct
text labels on data collecting
15. Results | Discussion
15
ex = Page data, from Facebook Insights
ax = “Lifetime Likes by City - Tupã, SP, Brazil”
ay = “Lifetime Likes by City - Bauru, SP, Brazil”
Example
The attributes “Lifetime Likes by City - Tupã, SP,
Brazil” and “Lifetime Likes by City - Bauru, SP,
Brazil” have different text labels, but share same
semantics.
16. Results | Discussion
16
The absence of semantics with an exception for the
availability of text labels do not ensure that attributes of two
distinct entities (ex and ey) that shares equal labels (ax),
consequently, are sharing the same formal semantics (S) on
data collecting process by external agents.
17. Results | Discussion
17
ax = “Impressions”
ex = Search Analytics, from Google Search Console
ey = Tweet Activity, from Twitter Analytics
Example
Both attributes share equal text labels, but this coincidence
does not ensure that the attributes have the same semantics. In
this example, each entity applied different data types to these
attributes, resulting in a mismatch on their values.
It’s a
average!
It’s a total
amount!
18. Final considerations
18
The data analysis
→ To identify the critical points of descriptive elements on
those datasets.
→The lack of descriptive elements in data collection process
when triggered through the available export interfaces.
19. Final considerations
19
To reduce this dissonance between attributes, export interfaces
can bring more semantic information bound to datasets.
→ Important information to interpret data available from
different sources.
For example text labeling rules, controlled vocabularies, and restriction
clauses.
The semantic dissonances on these entities may interfere with
the development process of relationships between attributes
from different datasets, decreasing the potential of
interoperability.
20. References
20
Berg, O. (2015). Collaborating in a social era: ideas, insights and models
that inspire new ways of thinking about collaboration. Göteborg:
Intranätverk.
Cornell, P. (2005). A complete guide to PivotTables: a visual approach.
Berkeley, CA : New York: Apress ; Distributed to the Book trade in the
United States by Springer-Verlag.
Date, C. J. (2016). The new relational database dictionary: a
comprehensive glossary of concepts arising in connection with the
relational model of data, with definitions and illustrative examples:
[terms, concepts, and examples]. Sebastopol, CA: O´Reilly.
Goodwin, P., & Wright, G. (2014). Decision analysis for management
judgment (5th Edition). Hoboken, New Jersey: Wiley.
Gray, J., Bosworth, A., Lyaman, A., & Pirahesh, H. (1996). Data cube: a
relational aggregation operator generalizing GROUP-BY, CROSS-TAB,
and SUB-TOTALS (pp. 152–159). IEEE Comput. Soc. Press.
https://doi.org/10.1109/ICDE.1996.492099
Ikemoto, G. S., & Marsh, J. A. (2007). Cutting Through the “Data-
Driven” Mantra: Different Conceptions of Data-Driven Decision Making.
Yearbook of the National Society for the Study of Education, 106(1),
105–131. https://doi.org/10.1111/j.1744-7984.2007.00099.x
Inmon, W. H. (2005). Building the data warehouse (4th ed). Indianapolis,
Ind: Wiley.
Kimball, R., & Ross, M. (2011). The Data Warehouse Toolkit The
Complete Guide to Dimensional Modeling. New York, United States of
America: John Wiley & Sons. Retrieved from http://nbn-
resolving.de/urn:nbn:de:101:1-2014122311140
Lebo, T., & Williams, G. T. (2010). Converting governmental datasets
into linked data. In Proceedings of the 6th International Conference on
Semantic Systems. Graz, Austria: ACM Press.
https://doi.org/10.1145/1839707.1839755
Rathod, A. (2006). A messaging system to handle semantic dissonance
(Thesis). Rochester Institute of Technology, New York. Retrieved from
http://scholarworks.rit.edu/cgi/viewcontent.cgi?article=1668&context=the
ses
Reddy, G. S., Srinivasu, R., Rao, M. P. C., & Rikkula, S. R. (2010). Data
warehousing, data mining, OLAP, OLTP technologies are essential
elements to support decision-making process in industries. International
Journal on Computer Science and Engineering, 2(9), 2865–2873.
Ross Parry, Nick Poole, & Jon Pratty. (2008). Semantic Dissonance: Do
We Need (And Do We Understand) The Semantic Web? In Toronto:
Archives & Museum Informatics. Retrieved from
http://www.archimuse.com/mw2008/papers/ parry/parry.html
Sant’Ana, R. C. G. (2016). Ciclo de vida dos dados: uma perspectiva a
partir da ciência da informação. Informação & Informação, 21(2), 116.
https://doi.org/10.5433/1981-8920.2016v21n2p116
Santos, P. L. V. A. da C., & Sant’Ana, R. C. G. (2015). Dado e
Granularidade na perspectiva da Informação e Tecnologia: uma
interpretação pela Ciência da Informação. Ciência da Informação, 42(2),
11.
21. References
21
Shafranovich, Y. (2005). Common Format and MIME Type for Comma-
Separated Values (CSV) Files. The Internet Society. Retrieved from
https://tools.ietf.org/html/rfc4180
Silberschatz, A., Korth, H. F., & Sudarshan, S. (2011). Database system
concepts (6th edition). New York: McGraw-Hill.
Tennison, J., Kellogg, G., & Herman, I. (2015, December 17). Model for
Tabular Data and Metadata on the Web. (J. Tennison & G. Kellogg,
Eds.). World Wide Web Consortium. Retrieved from
https://www.w3.org/TR/tabular-data-model/
Turban, E., Aronson, J. E., & Liang, T.-P. (2004). Decision Support
Systems and Intelligent Systems (7th Edition). Upper Saddle River, NJ,
USA: Prentice-Hall, Inc.
fernando (at) rodrigues.pro.br
phbisi (at) gmail.com
ricardosantana (at) marilia.unesp.br
http://dadosabertos.info