Interactive and collaborative AI for biodiversity monitoring and beyond - JWKamminga - SRD23

Interactive and Collaborative AI for
Biodiversity Monitoring and Beyond
dr. Jacob Kamminga

Digital Species Identification
Biodiversity monitoring
dr. Jacob Kamminga

Cyanistes caeruleus (0.99)
Pieris rapae (0.98)

Pieris rapae (0.98)
Pieris rapae (0.98)
2

Collaborative Annotation Session

0
1
2
3
4
5
6
7
Cat Human Hedgehog Magpie Marten Starling

Biodiversity data is long-tailed

0
1
2
3
4
5
6
7
Cat Human Hedgehog Magpie Marten Starling
Biodiversity data is long-tailed

Challenges and Opportunities
● There are ~8.7 million species on our planet
● Data is long-tailed and algorithms are biased
● We need more training data and algorithms
● Algorithm availability growing like mushrooms
● Let’s use them in our workflow!

Interactive AI Filter out mundane data
(cats and humans)
Humans annotate data
for classes that are
difficult for algorithm
(hedge hogs)
Continued training
of algorithm
Iteration 1
Self-learning data
selection
Analyze data
(inference)
Training data (data + labels)

Interactive AI Filter out mundane
data (cats, humans,
and hedge hogs)
Humans annotate data
for classes that are
difficult for algorithm
(marten)
Continued training
of algorithm
Iteration 2
Self-learning data
selection
Training data (data + labels)
Analyze data
(inference)

Goal
Save experts
tremendous
amounts of time

Knowing the Unknown: Open-World Recognition for Biodiversity Datasets, MSc thesis, 2023, Rajesh Gangireddy

Challenges and Opportunities
● Biodiversity data is highly fine-grained
● The number of experts that can identify fine-grained species
is small and declining
● Willingness to share and collaborate
● Many experts connected to Naturalis
● Biodiversity is a popular topic amongst researchers,
students, and citizens

Ideas for collaborative annotation
• After data selection, who do I ask to provide the annotation?
• Pay it forward -> the algorithm that helped you filter out mundane
data was trained through someone else’s annotation (training data).
Your effort will impact many after you
• Build up personal profiles to inform others about interests and
expertise. If you have data that is relevant to an expert, she is likely
willing to help
• Large opportunities for gamification

Source code Analyze new data
Training
Comparison
Data labeling
Evaluation
Open-Source
code
Training on Dutch
national
supercomputer
Model serving
framework
Integrated experiment
tracking and evaluation
Comparison and
co-development
by hosting AI
challenges

provenance
Collaborate
& Reuse
Sensor
management
Host algorithms
open-source
Interactive
data labeling
Train on
Snellius
Integrated
evaluation
Compare
(leaderboards)

Required infrastructure
• Data storage, management, and browsing
• Lakehouse architecture + iRods @ SURF
• Algorithm repository
• Host source code and/or containers
• Evaluate and Compare algorithms
• Data annotation platform
• Self-learning data selection (active learning)
• Collaboration (connect experts, setup citizen science campaigns)
• Model serving framework
• Deploy algorithms to analyze data (inference) (flexible compute: AWS, Snellius,
Lisa)
• Continued training of algorithms (Snellius @ SURF)

Progress
• Data storage, management, and browsing
• Lakehouse architecture + iRods @ SURF
• Algorithm repository
• Host source code and/or containers
• Evaluate and Compare algorithms
• Data annotation platform
• Active learning
• Collaboration (connect experts, setup citizen science campaigns)
• Model serving framework
• Deploy algorithms to analyze data (inference) (flexible compute: AWS, Snellius,
Lisa)
• Continued training of algorithms (Snellius @ SURF)

• Biodiversity
• Medical
• Industry
• Self-driving cars
• Most real-world AI problems
The Long Tail of … data

Opportunities
• Leverage significant societal and environmental impact
• Save domain experts (e.g., taxonomists) tremendous amounts of time
processing unlabeled data into labeled and organized data
• Provide full provenance in AI systems (from sensor to report)
• Speed up the development of AI technology from self-learning data
selection, interactive and collaborative annotation, to AI training
• Help many non-technical users to fully exploit the latest developments
in the quickly advancing AI domain

Interactive and Collaborative AI for
Biodiversity Monitoring and Beyond
dr. Jacob Kamminga
Thank you for your attention!

Interactive and collaborative AI for biodiversity monitoring and beyond - JWKamminga - SRD23

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Interactive and collaborative AI for biodiversity monitoring and beyond - JWKamminga - SRD23

Semelhante a Interactive and collaborative AI for biodiversity monitoring and beyond - JWKamminga - SRD23 (20)

Mais de SURFevents

Mais de SURFevents (20)

Último

Último (20)

Interactive and collaborative AI for biodiversity monitoring and beyond - JWKamminga - SRD23