The document discusses using interactive and collaborative AI to help with biodiversity monitoring. It describes how AI can be used to filter out common species like cats and humans from image data, allowing experts to focus on annotating rarer species. Through repeated iterations, the AI is continually retrained to learn new species with the help of human experts. The goal is to save experts tremendous amounts of time processing data by offloading simple tasks to AI and facilitating collaboration between experts and citizens. Infrastructure is needed to store and manage biodiversity data, host AI algorithms, enable interactive data labeling, and deploy trained models to analyze new data.
42. Challenges and Opportunities
● There are ~8.7 million species on our planet
● Data is long-tailed and algorithms are biased
● We need more training data and algorithms
● Algorithm availability growing like mushrooms
● Let’s use them in our workflow!
43.
44.
45. Interactive AI Filter out mundane data
(cats and humans)
Humans annotate data
for classes that are
difficult for algorithm
(hedge hogs)
Continued training
of algorithm
Iteration 1
Self-learning data
selection
Analyze data
(inference)
Training data (data + labels)
46. Interactive AI Filter out mundane
data (cats, humans,
and hedge hogs)
Humans annotate data
for classes that are
difficult for algorithm
(marten)
Continued training
of algorithm
Iteration 2
Self-learning data
selection
Training data (data + labels)
Analyze data
(inference)
48. Knowing the Unknown: Open-World Recognition for Biodiversity Datasets, MSc thesis, 2023, Rajesh Gangireddy
49. Challenges and Opportunities
● Biodiversity data is highly fine-grained
● The number of experts that can identify fine-grained species
is small and declining
● Willingness to share and collaborate
● Many experts connected to Naturalis
● Biodiversity is a popular topic amongst researchers,
students, and citizens
50.
51. Ideas for collaborative annotation
• After data selection, who do I ask to provide the annotation?
• Pay it forward -> the algorithm that helped you filter out mundane
data was trained through someone else’s annotation (training data).
Your effort will impact many after you
• Build up personal profiles to inform others about interests and
expertise. If you have data that is relevant to an expert, she is likely
willing to help
• Large opportunities for gamification
53. Source code Analyze new data
Training
Comparison
Data labeling
Evaluation
Open-Source
code
Training on Dutch
national
supercomputer
Model serving
framework
Integrated experiment
tracking and evaluation
Comparison and
co-development
by hosting AI
challenges
66. • Biodiversity
• Medical
• Industry
• Self-driving cars
• Most real-world AI problems
The Long Tail of … data
67.
68. Opportunities
• Leverage significant societal and environmental impact
• Save domain experts (e.g., taxonomists) tremendous amounts of time
processing unlabeled data into labeled and organized data
• Provide full provenance in AI systems (from sensor to report)
• Speed up the development of AI technology from self-learning data
selection, interactive and collaborative annotation, to AI training
• Help many non-technical users to fully exploit the latest developments
in the quickly advancing AI domain
69. Interactive and Collaborative AI for
Biodiversity Monitoring and Beyond
dr. Jacob Kamminga
Thank you for your attention!