Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
WP3: overzicht van de voortgang van WP# op de CLARIAH-dag
1. 1
Common Lab Research Infrastructure for the Arts and Humanities
2. 2
• WP3
as
part
of
CLARIAH
– Discipline:
Linguis=cs
– Data
type:
primarily
textual
data
• WP3
as
successor
of
CLARIN
• WP3
‘incorporates’
Nederlab
(NWO-‐groot)
3. 3
• Linguis=cs
– Support
for
the
researcher
in
each
stage
of
a
research
project
• What
is
needed
• What
is
available
• What
func=onality
must
be
created
/
improved
• Coopera=on
projects
with
WP2,
WP4
Soc
Econ
&
WP5
Media
Studies
4. 4
• Theme
1:
Data
and
metadata
• Theme
2:
Interoperability
• Theme
3:
Enrichment
and
annota=on
• Theme
4:
Search
and
research
5. 5
• New
Resources
– text
corpora,
crowd
sourcing,
survey
tool,
databases
• Exis=ng
Resources
– browsing
&
searching
for
data
and
tools
and
selec=ng
them
• Enriching
resources
– cura=on,
linguis=c
annota=ons,
transcrip=on,
named
en==es
• Searching
/
analyzing
(enriched)
resources
• Representa=on/visualiza=on
search
results
• Store
new
resources
in
CLARIAH
• Make
enhanced
publica=ons
6. 6
• Incorporate
data
/
tools
in
CLARIAH
– With
proper
metadata
– With
IPR/Ethical
Issues
properly
dealt
with
– Archiving
/
Ingest
func=onality
– Deployment
Framework
• How
to
run
services
efficiently
– Required:
standardiza=on
(input
–
output
formats),
metadata,
interface
elements
– Interoperability
(syntac=c
and
seman=c)
7. 7
• Interoperability
• Linked
Open
Data
• CMDI
è
RDF
• En==es
• Vocabularies
• PICCL
8. 8
• Coopera=on
WP4
/
WP5
– Text
-‐>
structured
data
– WP4:
e.g.
detect
strikes
in
newspapers
of
1965,
Athena
– WP5:
probably
convert
scanned
and
OCR’ed
`filmladders’
into
structured
data
– Speech
-‐>
text
10. 10
• Search
applica=on
for
treebanks
• LASSY,
CGN
• One’s
own
corpus
• Special
word
rela=ons
interface,
XPATH
interface
• New:
• meta-‐data
in
the
search
query
(period,
sex,
region,
etc.)
• results
can
be
presented
as
aggregate
or
split
by
metadata
• Illustra=ons:
• CGN
(Spoken
Dutch
Corpus)
with
metadata
• Dutch
CHILDES
Corpora
with
metadata
• hjp://zardoz.service.rug.nl:8067/
11. 11
• Search
applica=on
for
treebanks
(LASSY,
CGN,
SONAR)
• Example-‐based
interface,
XPATH
interface
• New:
Uploading
one’s
own
corpus
16. 16
WP
scien=fic
leader
Sjef
Barbiers
(Meertens)
Technical
coordinator
Daan
Broeder
(Meertens)
WP3
advisor
Jan
Odijk
Leader
RUN
Antal
van
den
Bosch
Leader
VU
Piek
Vossen
Leader
INL
Jan
Theo
Bakker
Leader
UU
Jan
Odijk
Leader
RUG
Gertjan
van
Noord
Leader
Meertens
Marc
Kemps
Snijders