SlideShare a Scribd company logo
1 of 17
Download to read offline
Mapping Commodity Trading in the 19th Century
Benjamin Bach,
INRIA,
Paris
Asma Malik,
University of
Strathclyde,
Glasgow
Michael
Mauderer,
University of
St Andrews
Sadiq Sani,
Robert Gordon
University,
Aberdeen
Joe Wandy,
University of
Glasgow
Outline
● Project Overview
● Data
● Technology
● Demo
● Future Work
Overview
19th Century
Commodities Diseases
Locations Disasters
Process
Tasks
● Retrieve documents mentioning
○ Commodities
○ Locations
○ Time range
● Relations between retrieved terms
○ Spatial relations
○ Temporal relations
○ Co-occurrence relations
Users:
Historians
Data
● Commodities: 1067
● Time: 1600 - 1952 (452 years)
● Documents: 18 580
● Location occurrences: 91 650 469
● Commodity occurrences: 29 020 013
The Data
● PostgreSQL Database in Edinburgh
○ Not accessible
● PostgreSQL Database in St Andrews
○ Low Performance
● PostgreSQL Database Backup
○ 2.5GB compressed binary data
○ Cannot be imported into Amazon RDS
Solution 1
● Create a more compatible SQL export to
import into Amazon RDS
○ 24GB raw text file containing SQL statements
○ still incompatible
○ hard to correct errors in a timely manner
Solution 2
● Create EC2 instance running a PostgreSQL
database
○ Powerful enough
○ Enough storage
○ Accessible
Big Data Problems
● Simple things take a long time
● Incremental finding of errors/new problems
The Pipeline
● D3 for client-side presentation
● Java+SQL for server-side processing
data
Database
Web Service
Client
Commodities, date range
Initial Sketches
Visualization
- Space and time
-> Finding related terms + documents
- find related documents
- what are documents talking about
- Implicit knowledge:
- Co-occurrences of terms in documents
For every commodity:
1) Get top 10 documents,
2) Limit related terms to 6
3) Sum up co-occurrences
Demo
Future work
- Query by Location
- Time diagrams for term frequency over time
- Encode information in matrix cells (#doc,collection..)
- Show and browse documents
- Handle big data: diseases, disasters, ..
- Co-occurrences ?
Thank you for listening!

More Related Content

Similar to Mapping Commodity Trading

Data engineering in 10 years.pdf
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdfLars Albertsson
 
Simple Archive Architectures
Simple Archive ArchitecturesSimple Archive Architectures
Simple Archive ArchitecturesLighton Phiri
 
Lviv Outsourcing Forum 2016 Михайло Крамаренко “IT-outsourcing: Retrospection...
Lviv Outsourcing Forum 2016 Михайло Крамаренко “IT-outsourcing: Retrospection...Lviv Outsourcing Forum 2016 Михайло Крамаренко “IT-outsourcing: Retrospection...
Lviv Outsourcing Forum 2016 Михайло Крамаренко “IT-outsourcing: Retrospection...Lviv Startup Club
 
Portland Common Data Model (PCDM): Creating and Sharing Complex Digital Objects
Portland Common Data Model (PCDM): Creating and Sharing Complex Digital ObjectsPortland Common Data Model (PCDM): Creating and Sharing Complex Digital Objects
Portland Common Data Model (PCDM): Creating and Sharing Complex Digital ObjectsKaren Estlund
 
The Internet in Database: A Cassandra Use Case
The Internet in Database: A Cassandra Use CaseThe Internet in Database: A Cassandra Use Case
The Internet in Database: A Cassandra Use CaseDatafiniti
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015Kanwal Prakash Singh
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015Kanwal Prakash Singh
 

Similar to Mapping Commodity Trading (7)

Data engineering in 10 years.pdf
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdf
 
Simple Archive Architectures
Simple Archive ArchitecturesSimple Archive Architectures
Simple Archive Architectures
 
Lviv Outsourcing Forum 2016 Михайло Крамаренко “IT-outsourcing: Retrospection...
Lviv Outsourcing Forum 2016 Михайло Крамаренко “IT-outsourcing: Retrospection...Lviv Outsourcing Forum 2016 Михайло Крамаренко “IT-outsourcing: Retrospection...
Lviv Outsourcing Forum 2016 Михайло Крамаренко “IT-outsourcing: Retrospection...
 
Portland Common Data Model (PCDM): Creating and Sharing Complex Digital Objects
Portland Common Data Model (PCDM): Creating and Sharing Complex Digital ObjectsPortland Common Data Model (PCDM): Creating and Sharing Complex Digital Objects
Portland Common Data Model (PCDM): Creating and Sharing Complex Digital Objects
 
The Internet in Database: A Cassandra Use Case
The Internet in Database: A Cassandra Use CaseThe Internet in Database: A Cassandra Use Case
The Internet in Database: A Cassandra Use Case
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015
 

Recently uploaded

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 

Recently uploaded (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

Mapping Commodity Trading

  • 1. Mapping Commodity Trading in the 19th Century Benjamin Bach, INRIA, Paris Asma Malik, University of Strathclyde, Glasgow Michael Mauderer, University of St Andrews Sadiq Sani, Robert Gordon University, Aberdeen Joe Wandy, University of Glasgow
  • 2. Outline ● Project Overview ● Data ● Technology ● Demo ● Future Work
  • 5. Tasks ● Retrieve documents mentioning ○ Commodities ○ Locations ○ Time range ● Relations between retrieved terms ○ Spatial relations ○ Temporal relations ○ Co-occurrence relations Users: Historians
  • 6. Data ● Commodities: 1067 ● Time: 1600 - 1952 (452 years) ● Documents: 18 580 ● Location occurrences: 91 650 469 ● Commodity occurrences: 29 020 013
  • 7. The Data ● PostgreSQL Database in Edinburgh ○ Not accessible ● PostgreSQL Database in St Andrews ○ Low Performance ● PostgreSQL Database Backup ○ 2.5GB compressed binary data ○ Cannot be imported into Amazon RDS
  • 8. Solution 1 ● Create a more compatible SQL export to import into Amazon RDS ○ 24GB raw text file containing SQL statements ○ still incompatible ○ hard to correct errors in a timely manner
  • 9. Solution 2 ● Create EC2 instance running a PostgreSQL database ○ Powerful enough ○ Enough storage ○ Accessible
  • 10. Big Data Problems ● Simple things take a long time ● Incremental finding of errors/new problems
  • 11. The Pipeline ● D3 for client-side presentation ● Java+SQL for server-side processing data Database Web Service Client Commodities, date range
  • 13.
  • 14. Visualization - Space and time -> Finding related terms + documents - find related documents - what are documents talking about - Implicit knowledge: - Co-occurrences of terms in documents For every commodity: 1) Get top 10 documents, 2) Limit related terms to 6 3) Sum up co-occurrences
  • 15. Demo
  • 16. Future work - Query by Location - Time diagrams for term frequency over time - Encode information in matrix cells (#doc,collection..) - Show and browse documents - Handle big data: diseases, disasters, .. - Co-occurrences ?
  • 17. Thank you for listening!