Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Semantic Support for Complex Ecosystem Research Environments
1. Semantic Support for Complex
Ecosystem Research Environments
Deborah McGuinness1
, Paulo Pinheiro1
,
Henrique Santos1,2
, Matthew Klawonn1
,
Katherine Chastain1
1
Rensselaer Polytechnic Institute, USA
2
Universidade de Fortaleza, Brazil
AGU, December 2015
3. Problem Statement
• In large projects, how should data be:
–Integrated with other relevant data and
metadata?
–Interpreted?
• And also
–Accessed, shared, and visualized?
• Examples of data types in projects we
work on:
–Environmental monitoring
–Architecture science and ecology
3
4. Foundational Technologies
• Ontologies: For capturing context
–PROV-O
–OBOE
–VSTO
–HASNetO
• Apache SOLR: For storage and retrieval
• Contextualized CSVs: For data annotation
• D3 Javascript: For metadata visualization
4
6. HADatAc
• Human Aware Data Acquisition Framework
• A web application based on Apache SOLR, the
Play Framework
• Goal: To provide a one-stop-shop for combined
data and metadata management, markup,
integration, retrieval, and visualization
• Uses ontologies combined with limited human
markup to achieve this goal
• Can be deployed on a laptop or server,
depending on a user's needs
6
8. Data Privacy
• In addition to nice visualization,
integration, and retrieval features,
HADatAc has sophisticated privacy
mechanisms
• Data has various levels of access open
to anonymous and pre-registered users.
8
10. Ease of Use
== START-PREAMBLE ==
@base <http://localhost#> .
.
@prefix hasneto: <http://hadatac.org/ont/hasneto#> .
@prefix hadatac: <http://hadatac.org/ont/hadatac#> .
<example-kb> a hadatac:KnowledgeBase; hadatac:hasHost
"http://localhost"^^xsd:anyURI .
<dataCollection-example01> a hasneto:DataCollection; prov:startedAtTime "2015-02-
12T09:30:00Z"^^xsd:dateTime .
<deployment-example01> hasneto:hasDataCollection <dataCollection-example01> .
<example01-dataset01> a vstoi:Dataset; prov:wasGeneratedBy <dataCollection-
example01>; hadatac:hasMeasurementType <mt0>,<mt1> .
<mt0> a hadatac:MeasurementType; time:inDateTime <ts0>; hadatac:atColumn 3;
oboe:ofCharacteristic hadatac-entities:EC-WindDirection; oboe:usesStandard oboe-
standards:Degree .
<mt1> a hadatac:MeasurementType; time:inDateTime <ts0>; hadatac:atColumn 2;
oboe:ofCharacteristic hadatac-entities:EC-WindSpeed; oboe:usesStandard oboe-
standards:MeterPerSecond .
<ts0> hadatac:atColumn 0 .
== END-PREAMBLE ==
TimeStamp,Record,WindSpdAve_ms,WindDir,WindSpd_ms_Min,WindSpdGust_ms_
Max,AirTemp_C_Avg,RH_Pct_Avg,BaroPress_hPa_Avg,Rain_mm_Tot,Hail_Hits_Tot
2015-02-12T09:30:00Z,0,0.99,217.9,0.3,1.7,-4.5,66.58,995,0,0
2015-02-12T09:45:00Z,1,1.112,227.8,0.1,2.1,-4.372,66.45,995,0,0
2015-02-12T10:00:00Z,2,1.169,222.2,0.3,2.6,-4.146,65.98,995,0,0
10
• Work with csv files
• Automate data
transfer across the
web, including large
amounts of data
• Retrieval (e.g faceted
search), and
visualization tools are
automatically usable
with uploaded data.
11. Conclusions
• Various ontologies were presented with
the intent to show how they capture
context in big data projects
• HADatAc was introduced, along with
some of its key functionalities.
11
HADatAc is a cross-platform web service which
integrates annotated data sets with other relevant
data and metadata, and surrounds them with
retrieval (faceted search) and visualization tools
as well as privacy controls.
12. Future Steps
• Refine HASNetO vocabulary and test it
over a constantly growing HASNetO-
based knowledge base.
• Continue to add functionality to HADatAc
–More visualization tools
–Enhanced search capabilities
–Looking to integrate with lab information
management systems (potentially use
with science other than medicine)
12
13. More Information
• Contact Information
– Deborah McGuinness: dlm@cs.rpi.edu
– Paulo Pinheiro: pinhep@rpi.edu
– Matt Klawonn: klawom@rpi.edu
13