Generative AI on Enterprise Cloud with NiFi and Milvus
The importance of FAIR and the Community of Data Driven Insights - the road to the science of the future
1. The importance of Data-Driven Communities
Road to the next Science
Carlos Utrilla Guerrero (Data Scientist)
c.utrillaguerrero@maastrichtuniversity.nl
3. “
This is perhaps an old, globally
and persistent question. The
transition towards the Next
Science can happen through
Open and FAIR principles
How can we make a better science?
CDDI Commentary (Draft): https://docs.google.com/document/d/1VQ7hBd8UOdvjGec2BwwCjC2cO3k6b9CU9NQsySh8MT0/edit?usp=sharing
4. “
No-one ever said FAIR was easy, but we have
to go through the hardship of making our
resources FAIR to enable better science
together. It benefits everyone to make it as
easy as possible for communities to make steps
in the direction of optimally achievable
FAIRness in their domain. - pag. 26*
Opinions of the original creators of the FAIR principles*
* FAIR Principles: Interpretations and Implementation Considerations (2020): https://doi.org/10.1162/dint_r_00024
5. Cathedral thinking - commitment to long-term visions
Leonardo Da Vinci
conceptualized the idea of
people being able to fly 400
years
Sagrada Familia, Spain.Gaudí began designing the structure in
1882, inspired by a far-reaching vision that extended well
beyond his lifetime. Construction on the church is scheduled to
wrap up in 2026.
Taking long-term thinking to a whole new level:
seven generations!
The foundations in success to Next Science
6. Enablers of discovery and innovations
Starting point of the journey: Open Science and FAIR
Preliminary report on the first draft of the Recommendation on Open Science
7. Google Trends: FAIR Data topic
https://trends.google.com/trends/explore?date=today%205-y&q=%2Fg%2F11g88dlvw4
8. Google Trends: FAIR Data topic
https://trends.google.com/trends/explore?date=today%205-y&q=%2Fg%2F11g88dlvw4
9. Visualising the relationship between concepts
Starting point of the journey: Open Science and FAIR
https://trends.google.com/trends/explore?date=today%205-y&q=%2Fg%2F11g88dlvw4,%2Fm%2F025ttdm,
%2Fm%2F0j9kvph
11. An international, bottom-up paradigm for
the discovery and reuse of digital content
by and for people and machines
12. FAIR principles do not dictate specific technological implementations, rather provide
guidance for improving...:
1. Findability, by globally unique, persistent identifier, rich metadata and indexed for
search.
2. Accessibility, retrievable using standardised, open protocol that specify access
restrictions where necessary.
3. Interoperability, meta-data use a formal, accessible, shared and broadly for
knowledge representation using FAIR vocabularies and qualified links to other
resources.
4. Reusability accurate and relevant attributes, provenance, and data usage license,
meet domain-relevant community standard.
...of digital resources.
Software
Metadata
Data
13. Community For Data Driven Insights (CDDI)
Building the road to the science of the future
c.utrillaguerrero@maastrichtuniversity.nl
14. Towards FAIR University: from principle to practise
Showcases to prove concept across disciplines
Check the conceptual map here:
https://embed.kumu.io/6a045ad6e4c091c5bf4600de8a
f94d5f#untitled-map/ludeme
16. Ludeme Summary
● What?
○ Computational study
○ World’s traditional games
○ Recorded human history
● Objectives
1. Model: Full range of traditional games in a single
playable database
2. Reconstruct: Missing knowledge about games
more accurately
3. Map: Spread of games and assoc. mathematical
ideas through history
● Question:
○ Can we use modern computational techniques to
help improve our understanding of ancient
culture?
Slides Cameron Browne, 2018 http://www.ludeme.eu/outputs/browne-bgs-2018.pdf
17. Digital Ludeme Portal, one solution to a FAIR dataset
“DLP Database illustrates how we are obtaining this data and storing it in a consistent format both for
use within the Digital Ludeme Project and for the benefit of other researchers and practitioners,
and how we are adopting FAIR principles to make this database as reliable, accessible and useful as
possible. - Matthew Stephenson et al 2020
18. DLP Database path of FAIRness
DLP database is fully
public and available to
access in php server that
support sql
File with all variables,
naming convention,
descriptions, raw and
transformed data
codebook/data dictionary.
Documentation and
readme with release
version, tutorials, license
and user guide.
Explaining ETL (extract,
transform, load)
Unambiguous identification
of games.
https://ludii.games/identifier?Id=
DLP.Games.427
Luddi Dataset
19. Partly FAIR may be fair enough but let's improve it!
URL Persistent schemas
Searchable major engine
Use FAIR vocabularies and
qualified links
Resource identifier: https://ludii.games/library.php
Record provenance and
follow community-standards
20. Research excellence and the Lawgex Project
How to implement FAIR is particular and
unique to the community in which you are
doing your research - Kody Moodley
21. Challenge accepted - create a FAIR solution and share with others
Existing technologies such Semantic Web are widely
accepted to adhere with Interoperability principle
It's also about how we share our solution:
#fairsolution are everywhere, reuse solutions from
existing implementations.
And that's absolutely what we want:
● FAIR Vocabulary using EuroVoc
● Enriched with new terms using RDF standards
● Use Linked Open Data to fulfil FAIR
TO HARMONIZE THE MEANING OF INFORMATION
22. FAIR software tools:
This solves the “works on my computer, but not on
yours” problem by packaging all the necessary
software dependencies and operating system
resources required by an application in one
independent software “container”.
Let’s start using reproducible research tools
text data code version
Next
Science
PDF
reproducibility
spectrum
0% 100%
23. ● Human:
○ “What you can’t measure, you can’t
improve it” - Peter Drucker
○ Reward and prestige experience
○ Share personas responsibilities
accordingly
● Technical:
○ Lack of catalog of tools per faculty
○ Absence of data management
guidelines
○ Lack of documentation
● Human:
○ Inclusive with diverse paths to FAIR
○ Community: Permissive culture
○ Individuals: Ownership and
engagement
● Technical:
○ Human-centric design (user friendly
interface for all)
Drivers:
Personas = data stewards, data scientist, researchers, data engineers, phd & master & bachelor students, PI, software
developers, Managers
Barriers:
24. We must find a way to understand intentions,
behaviour and motivations to apply FAIR to be able to
generate the next services focusing on the personas.
Personas = attract different people of
diverse backgrounds, desires. skills and experiences
Lesson 1: Empathy
25. Lesson 2: The “art of planning into distant distant future”
but...
“the need of long-term thinking is a matter of utmost urgency, requiring
immediate action in the present.” - The Good Ancestor 2021 p.6
26. “
What kind of University will our
next generation like to come?
We will find this answer together
but typically the response root is
directed to education.
Lesson 3: FAIR and Data Science Literacy
CDDI Commentary (Draft): https://docs.google.com/document/d/1VQ7hBd8UOdvjGec2BwwCjC2cO3k6b9CU9NQsySh8MT0/edit?usp=sharing
27. A little progress over the last year and still work remains!
but concrete actions were done:
● FAIR Events and Workshops
● Lessons learned
Next year will focus:
● Personas engagement
● Increase FAIR and Data Science literacy
Crucial to connect CDDI showcases with data scientist community CDDI
Cathedral thinking (aka. long -term work towards FAIR University)
Summary and outlook
28. Thank you
You can find me at:
c.utrillaguerrero@maastrichtuniversity.nl