SlideShare uma empresa Scribd logo
1 de 20
Baixar para ler offline
1 © 2019 Deep SEARCH 9 GmbH1
Deep SEARCH 9
Distributing AI to the Amazon cloud
IC-SDV 2019 08 - 09 April Nice, France
Klaus Kater
Deep SEARCH 9 GmbH
Managing Partner
https://deepsearchnine.com
2 © 2019 Deep SEARCH 9 GmbH2
Sources
Surface Web
Deep Web
Databases
Repositories
Scheduled
execution
Unattendedretrieval/crawling
Prepare semantic search
Automatic publication
Deep SEARCH 9
Information Consumers
Ontology management
SEARCHCORPORA
• Biotech
• CROs
• Digital Therapeutics
• Technology Transfer Offices
• Clinical trials
• Other scopes of information
• Known (trusted) sources
• More complete
• Faster
Search applications for specific
scopes of information
3 © 2019 Deep SEARCH 9 GmbH3
Why moving to the cloud?
DS9 needs more and more resources…
2015 2016 2017 2018 2019
• 30.000 company websites
• Link depth 3
• Once every 3 months
• ca. 50 GB of data
• 60.000 company websites
• Link depth 5
• Every month
• ca. 1 TB of data
…because our search engines keep gobbling information
like the cookie monster gobbles cookies!
• 250.000 company websites
• Link depth 5
• Twice a month
• ?
4 © 2019 Deep SEARCH 9 GmbH4
Therefore we need:
The only place to get all of this, is the cloud!
More CPU power
Content classification
Semantic tagging
Machine learning
Faster networks High bandwidth requirements
Network latency problems
Scalability
Availability and responsiveness for users
CPU during analysis
Bandwidth during crawling
5 © 2019 Deep SEARCH 9 GmbH5
More CPU power
6 © 2019 Deep SEARCH 9 GmbH6
More CPU power
EC2 Dynamic Scaling price per hour hours yearly budget
EC2 r5.4xlarge + 2 TB SSD 1,22 € 8.250 10.065 €
Bare metal hardware price per month hours yearly budget
Bare metal hardware 839,00 € 8.760 10.068 €
7 © 2019 Deep SEARCH 9 GmbH7
More CPU power
But we need to be able to do the job in about 2 days
EC2 Runtime Compared to bare metal server Budget (year) Concurrent DS9 nodes Hours / day Hours / month Hours / year
EC2 10 instances 10.065 € 10 2 69 825
EC2 20 instances 10.065 € 20 1 34 413
EC2 50 instances 10.065 € 50 - 14 165
EC2 100 instances 10.065 € 100 - 7 83
EC2 Dynamic Scaling price per hour hours yearly budget
EC2 r5.4xlarge + 2 TB SSD 1,22 € 8.250 10.065 €
Bare metal hardware price per month hours yearly budget
Bare metal hardware 839,00 € 8.760 10.068 €
20x as much CPU for the same price!
8 © 2019 Deep SEARCH 9 GmbH8
Next bullet point: Faster networks
Viewers show the global
distribution of companies
in our SEARCHCORPORA
Obviously there are many
activities in Japan (JPN),
India (IND), China (CHN),
Korea (KOR), Hong Kong
(HKG), Iran (IRN), Pakistan
(PAK), Taiwan (TWN),
Malaysia (MYS),
Bangladesh (BGD),
Singapore (SGP), …
9 © 2019 Deep SEARCH 9 GmbH9
Faster networks
Note, how Tokyo and Seattle have the same distance to our servers
(9.300 km) as have Boston and New Delhi (6.000 km) but network
latency is much higher going east or south
Ping time from DS9 server
Circles are simply squeezed to compensate for Mercator distortion
10 © 2019 Deep SEARCH 9 GmbH10
Faster networks
But can we make the network connection faster?
Simple calculation
Typical page: 30 kB
Typical webserver: 500 pages
Transferring 1 page from Tokyo: 1.200ms
500 pages: 500 x 1.200ms = 10 minutes
1.000 servers: 6 days 23 hours
From Tokyo
Transferring 1 page from London: 82ms
500 pages: 500 x 82ms = 41 seconds
1.000 servers: 11,5 hours
From London
11 © 2019 Deep SEARCH 9 GmbH11
No. But we could distribute DS9!
We can distribute DS9 instances across the world using the Amazon cloud
This map shows the Amazon EC2 computer center locations
12 © 2019 Deep SEARCH 9 GmbH12
Distributing DS9
We can distribute DS9 instances across the world using the Amazon cloud
This map shows the Amazon EC2 computer center locations
13 © 2019 Deep SEARCH 9 GmbH13
Challenges
Use standards or develop proprietary?
Hadoop is what one thinks of when hearing distributed analytics…
MapReduce algorithms are good at distributing cut down analytics tasks across
multiple CPUs. This is what we would use on the filter step level. But it is not suited to
distribute whole filter chains with arbitrary analytics tasks like text annotation with
ontologies or Deep Web crawling with real-time constructed URLs
How can we minimize I/O operations?
I/O operations – especially indexing of data – and data transfer are the
bottlenecks and could potentially eat up all benefits coming from distribution
1. Data must be read only once from the DS9 backend (no copying)
2. Data must be transferred in compressed chunks (to overcome latency issues)
3. Data must be indexed only once at the final destination on the DS9 backend
14 © 2019 Deep SEARCH 9 GmbH14
DS9 standard node
Distributing DS9 instances
ds9App
Frontdoor
• Webserver
Firewall
Browserfarm
• DS9
• MySQL
• Elasticsearch
• Blazegraph
• DS9 Farming
• MySQL
• DS9 App
• MySQL
Frontdoor
• Webserver
Firewall
• DS9
• MySQL
• Blazegraph
DS9 distributed node
Smaller footprint!
15 © 2019 Deep SEARCH 9 GmbH15
Two new types of DS9 jobs were implemented:
That‘s what we always did
Execute a job from main DS9 host remotely
on some other DS9 host for load distribution
Execute a job from main DS9 host on a
dynamically allocated cluster of EC2 instances
that have DS9 Solutions installed
Controlled by DS9 Farming
URLs read from DS9 main host
Results written back to DS9 main host
Start 20 nodes in DS9 cluster mode
Use t3.xlarge node type (4 VCPUs, 96GB)
Run all instances at Amazon in Tokyo
DS9 EC2 cloud clusters
16 © 2019 Deep SEARCH 9 GmbH16
ds9App
Frontdoor
• Webserver
Firewall
Browserfarm
DS9 / IDE
DS9 standard installation
Instances are dynamically
allocated, deployed and
started, jobs are executed
and at the end all
instances are terminated
Accounting
Dynamic cloud allocation
powered by• DS9
• MySQL
• Elasticsearch
• Blazegraph
• DS9 Farming
• MySQL
• DS9 App
• MySQL
• DS9
• MySQL
• Blazegraph
Each node is a full installation of
DS9 Solutions (without Elasticsearch)
Finally fully scalable (this satisfies our 3rd need)
AWS Region Tokyo
20x – deployment takes < 5 minutes
17 © 2019 Deep SEARCH 9 GmbH17
1. export
DS9 Farming
2. unpack
Claim
containers
input
powered by
DS9 Solutions
• DS9
• MySQL
• Blazegraph
DS9 Solutions
• DS9
• MySQL
• Blazegraph
DS9 Solutions
• DS9
• MySQL
• Blazegraph
DS9 Solutions
• DS9
• MySQL
• Blazegraph
3. start nodes
4. import job
5. execute job
remote read
equally distribute URLs
among EC2 nodes
write
cache
remote write
Only move necessary resources to EC2
Execute Distributed Job
6. stop nodes
…
18 © 2019 Deep SEARCH 9 GmbH18
Sources
Information Scientists
SEARCHCORPORA
• Start-ups
• Competitors
• Regulatory
• New technology
• …
Scheduled
execution
Unattendedupdates
Automatic publication
• Known (trusted) sources
• More complete
• Faster
Managed Intelligence 2018
• Information source selection
• Content structuring
• Linking of disparate sources
• Ontology management
• SEARCHCORPUS® management
Search Competence Center
Information Consumers
Internal Customers
Expertise of information scientist needed
Unattended automatic execution of jobs
Sources
Surface Web
Deep Web
Databases
Repositories Prepare semantic search
Ontology management
19 © 2019 Deep SEARCH 9 GmbH19
Company repositories
e.g. Crunchbase
Master SEARCHCORPUS®
• Hundreds of thousands of websites
• Many Million web pages
• PDF-based publications
• Structured data
• Other sources
Extraction using
Lucene query +
classification
SEARCHCORPORA
• Biotech
• CROs
• Digital Therapeutics
• Technology Transfer Offices
• Clinical trials
• Other scopes of information
Customize for
research target
Automatic publication:
• target specific focus Information Consumers
Internal CustomersQuality assurance
Qualification
SEARCHCORPUS®
Crawling and automatic
classification for
classes of interest
Classified targets
Managed Intelligence 2019
Fully distributed
Expertise of information scientist needed
Crawling
Unattended automatic execution of jobs
Distributed automatic execution of jobs
Information Scientists
Search Competence Center
Surface / Deep
Web
20 © 2019 Deep SEARCH 9 GmbH20
Deep SEARCH 9
Distributing AI to the Amazon cloud
IC-SDV 2019 08 - 09 April Nice, France
Klaus Kater
Deep SEARCH 9 GmbH
Managing Partner
https://deepsearchnine.com

Mais conteúdo relacionado

Mais procurados

H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
Sri Ambati
 

Mais procurados (20)

IBM Big Data Analytics Concepts and Use Cases
IBM Big Data Analytics Concepts and Use CasesIBM Big Data Analytics Concepts and Use Cases
IBM Big Data Analytics Concepts and Use Cases
 
Building a Consistent Hybrid Cloud Semantic Model In Denodo
Building a Consistent Hybrid Cloud Semantic Model In DenodoBuilding a Consistent Hybrid Cloud Semantic Model In Denodo
Building a Consistent Hybrid Cloud Semantic Model In Denodo
 
Scaling Face Recognition with Big Data
Scaling Face Recognition with Big DataScaling Face Recognition with Big Data
Scaling Face Recognition with Big Data
 
A field guide to the Financial Times, Rhys Evans, Financial Times
A field guide to the Financial Times, Rhys Evans, Financial TimesA field guide to the Financial Times, Rhys Evans, Financial Times
A field guide to the Financial Times, Rhys Evans, Financial Times
 
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
 
Modern Data Platforms
Modern Data Platforms Modern Data Platforms
Modern Data Platforms
 
10 Good Reasons: NetApp for Automotive
10 Good Reasons: NetApp for Automotive10 Good Reasons: NetApp for Automotive
10 Good Reasons: NetApp for Automotive
 
ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)
 
20150630 kca big-data-with-cloud_output
20150630 kca big-data-with-cloud_output20150630 kca big-data-with-cloud_output
20150630 kca big-data-with-cloud_output
 
Agile Data Management with Enterprise Data Fabric (Middle East)
Agile Data Management with Enterprise Data Fabric (Middle East)Agile Data Management with Enterprise Data Fabric (Middle East)
Agile Data Management with Enterprise Data Fabric (Middle East)
 
The traditional data center is dead: How to win with hybrid DR
The traditional data center is dead: How to win with hybrid DRThe traditional data center is dead: How to win with hybrid DR
The traditional data center is dead: How to win with hybrid DR
 
IBM Big Data in the Cloud
IBM Big Data in the CloudIBM Big Data in the Cloud
IBM Big Data in the Cloud
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j
 
Using Cloud Hyperscale Vendors Cognitive Artificial Intelligence NoOps MLaaS
Using Cloud Hyperscale Vendors Cognitive Artificial Intelligence NoOps MLaaSUsing Cloud Hyperscale Vendors Cognitive Artificial Intelligence NoOps MLaaS
Using Cloud Hyperscale Vendors Cognitive Artificial Intelligence NoOps MLaaS
 
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4jScalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
 
Working with data using AI based tools
Working with data using AI based toolsWorking with data using AI based tools
Working with data using AI based tools
 
Role of Unified AI and ML in Cloud Technologies. Which Cloud Service Provider...
Role of Unified AI and ML in Cloud Technologies. Which Cloud Service Provider...Role of Unified AI and ML in Cloud Technologies. Which Cloud Service Provider...
Role of Unified AI and ML in Cloud Technologies. Which Cloud Service Provider...
 
Three Things to Consider When Making Investments in Your Big Data Infrastructure
Three Things to Consider When Making Investments in Your Big Data InfrastructureThree Things to Consider When Making Investments in Your Big Data Infrastructure
Three Things to Consider When Making Investments in Your Big Data Infrastructure
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
 
AI-SDV 2021: Nils Newmann - AI – Who is in control and why is that important?
AI-SDV 2021: Nils Newmann - AI – Who is in control and why is that important?AI-SDV 2021: Nils Newmann - AI – Who is in control and why is that important?
AI-SDV 2021: Nils Newmann - AI – Who is in control and why is that important?
 

Semelhante a IC-SDV 2019: Distributing AI to the Amazon Cloud - Klaus Kater (Deep SEARCH 9, Germany )

Semelhante a IC-SDV 2019: Distributing AI to the Amazon Cloud - Klaus Kater (Deep SEARCH 9, Germany ) (20)

A Brave new object store world
A Brave new object store worldA Brave new object store world
A Brave new object store world
 
Three ways object storage can save you time in 2017
Three ways object storage can save you time in 2017Three ways object storage can save you time in 2017
Three ways object storage can save you time in 2017
 
Cloud Opportunities with Virtualization
Cloud Opportunities with VirtualizationCloud Opportunities with Virtualization
Cloud Opportunities with Virtualization
 
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStackAdobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack
Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack
 
Libera la potenza del Machine Learning
Libera la potenza del Machine LearningLibera la potenza del Machine Learning
Libera la potenza del Machine Learning
 
Learn the new rules of cloud storage
Learn the new rules of cloud storageLearn the new rules of cloud storage
Learn the new rules of cloud storage
 
Building Complex Workloads in Cloud - AWS PS Summit Canberra
Building Complex Workloads in Cloud - AWS PS Summit CanberraBuilding Complex Workloads in Cloud - AWS PS Summit Canberra
Building Complex Workloads in Cloud - AWS PS Summit Canberra
 
Machine Learning for z/OS
Machine Learning for z/OSMachine Learning for z/OS
Machine Learning for z/OS
 
Solving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalSolving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute final
 
Simplicity Without Compromise Building a Cognitive Cloud
Simplicity Without Compromise Building a Cognitive CloudSimplicity Without Compromise Building a Cognitive Cloud
Simplicity Without Compromise Building a Cognitive Cloud
 
Software Defined IT @ Evento SOIEL Roma 6 Aprile 2017
Software Defined IT @ Evento SOIEL Roma 6 Aprile 2017Software Defined IT @ Evento SOIEL Roma 6 Aprile 2017
Software Defined IT @ Evento SOIEL Roma 6 Aprile 2017
 
Manage your database in the cloud like a pro with Cloud Volumes Service for A...
Manage your database in the cloud like a pro with Cloud Volumes Service for A...Manage your database in the cloud like a pro with Cloud Volumes Service for A...
Manage your database in the cloud like a pro with Cloud Volumes Service for A...
 
Cloud Data Management: Protecting your Cloud strategy
Cloud Data Management: Protecting your Cloud strategyCloud Data Management: Protecting your Cloud strategy
Cloud Data Management: Protecting your Cloud strategy
 
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetAppBridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
 
Virtualization and Containers
Virtualization and ContainersVirtualization and Containers
Virtualization and Containers
 
No Time to Waste: Migrate from Oracle to Postgres in Minutes
No Time to Waste: Migrate from Oracle to Postgres in MinutesNo Time to Waste: Migrate from Oracle to Postgres in Minutes
No Time to Waste: Migrate from Oracle to Postgres in Minutes
 
Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?
 
Cloud computing ft
Cloud computing ftCloud computing ft
Cloud computing ft
 
Hybrid as a Stepping Stone: It’s Not All or Nothing for Your Cloud Transforma...
Hybrid as a Stepping Stone: It’s Not All or Nothing for Your Cloud Transforma...Hybrid as a Stepping Stone: It’s Not All or Nothing for Your Cloud Transforma...
Hybrid as a Stepping Stone: It’s Not All or Nothing for Your Cloud Transforma...
 
AWS Cloud9
AWS Cloud9AWS Cloud9
AWS Cloud9
 

Mais de Dr. Haxel Consult

AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
Dr. Haxel Consult
 

Mais de Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Último

VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
@Chandigarh #call #Girls 9053900678 @Call #Girls in @Punjab 9053900678
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
JOHNBEBONYAP1
 
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
nirzagarg
 
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
nirzagarg
 
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
nirzagarg
 

Último (20)

Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
 
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
 
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
 
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
 
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
 
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
 

IC-SDV 2019: Distributing AI to the Amazon Cloud - Klaus Kater (Deep SEARCH 9, Germany )

  • 1. 1 © 2019 Deep SEARCH 9 GmbH1 Deep SEARCH 9 Distributing AI to the Amazon cloud IC-SDV 2019 08 - 09 April Nice, France Klaus Kater Deep SEARCH 9 GmbH Managing Partner https://deepsearchnine.com
  • 2. 2 © 2019 Deep SEARCH 9 GmbH2 Sources Surface Web Deep Web Databases Repositories Scheduled execution Unattendedretrieval/crawling Prepare semantic search Automatic publication Deep SEARCH 9 Information Consumers Ontology management SEARCHCORPORA • Biotech • CROs • Digital Therapeutics • Technology Transfer Offices • Clinical trials • Other scopes of information • Known (trusted) sources • More complete • Faster Search applications for specific scopes of information
  • 3. 3 © 2019 Deep SEARCH 9 GmbH3 Why moving to the cloud? DS9 needs more and more resources… 2015 2016 2017 2018 2019 • 30.000 company websites • Link depth 3 • Once every 3 months • ca. 50 GB of data • 60.000 company websites • Link depth 5 • Every month • ca. 1 TB of data …because our search engines keep gobbling information like the cookie monster gobbles cookies! • 250.000 company websites • Link depth 5 • Twice a month • ?
  • 4. 4 © 2019 Deep SEARCH 9 GmbH4 Therefore we need: The only place to get all of this, is the cloud! More CPU power Content classification Semantic tagging Machine learning Faster networks High bandwidth requirements Network latency problems Scalability Availability and responsiveness for users CPU during analysis Bandwidth during crawling
  • 5. 5 © 2019 Deep SEARCH 9 GmbH5 More CPU power
  • 6. 6 © 2019 Deep SEARCH 9 GmbH6 More CPU power EC2 Dynamic Scaling price per hour hours yearly budget EC2 r5.4xlarge + 2 TB SSD 1,22 € 8.250 10.065 € Bare metal hardware price per month hours yearly budget Bare metal hardware 839,00 € 8.760 10.068 €
  • 7. 7 © 2019 Deep SEARCH 9 GmbH7 More CPU power But we need to be able to do the job in about 2 days EC2 Runtime Compared to bare metal server Budget (year) Concurrent DS9 nodes Hours / day Hours / month Hours / year EC2 10 instances 10.065 € 10 2 69 825 EC2 20 instances 10.065 € 20 1 34 413 EC2 50 instances 10.065 € 50 - 14 165 EC2 100 instances 10.065 € 100 - 7 83 EC2 Dynamic Scaling price per hour hours yearly budget EC2 r5.4xlarge + 2 TB SSD 1,22 € 8.250 10.065 € Bare metal hardware price per month hours yearly budget Bare metal hardware 839,00 € 8.760 10.068 € 20x as much CPU for the same price!
  • 8. 8 © 2019 Deep SEARCH 9 GmbH8 Next bullet point: Faster networks Viewers show the global distribution of companies in our SEARCHCORPORA Obviously there are many activities in Japan (JPN), India (IND), China (CHN), Korea (KOR), Hong Kong (HKG), Iran (IRN), Pakistan (PAK), Taiwan (TWN), Malaysia (MYS), Bangladesh (BGD), Singapore (SGP), …
  • 9. 9 © 2019 Deep SEARCH 9 GmbH9 Faster networks Note, how Tokyo and Seattle have the same distance to our servers (9.300 km) as have Boston and New Delhi (6.000 km) but network latency is much higher going east or south Ping time from DS9 server Circles are simply squeezed to compensate for Mercator distortion
  • 10. 10 © 2019 Deep SEARCH 9 GmbH10 Faster networks But can we make the network connection faster? Simple calculation Typical page: 30 kB Typical webserver: 500 pages Transferring 1 page from Tokyo: 1.200ms 500 pages: 500 x 1.200ms = 10 minutes 1.000 servers: 6 days 23 hours From Tokyo Transferring 1 page from London: 82ms 500 pages: 500 x 82ms = 41 seconds 1.000 servers: 11,5 hours From London
  • 11. 11 © 2019 Deep SEARCH 9 GmbH11 No. But we could distribute DS9! We can distribute DS9 instances across the world using the Amazon cloud This map shows the Amazon EC2 computer center locations
  • 12. 12 © 2019 Deep SEARCH 9 GmbH12 Distributing DS9 We can distribute DS9 instances across the world using the Amazon cloud This map shows the Amazon EC2 computer center locations
  • 13. 13 © 2019 Deep SEARCH 9 GmbH13 Challenges Use standards or develop proprietary? Hadoop is what one thinks of when hearing distributed analytics… MapReduce algorithms are good at distributing cut down analytics tasks across multiple CPUs. This is what we would use on the filter step level. But it is not suited to distribute whole filter chains with arbitrary analytics tasks like text annotation with ontologies or Deep Web crawling with real-time constructed URLs How can we minimize I/O operations? I/O operations – especially indexing of data – and data transfer are the bottlenecks and could potentially eat up all benefits coming from distribution 1. Data must be read only once from the DS9 backend (no copying) 2. Data must be transferred in compressed chunks (to overcome latency issues) 3. Data must be indexed only once at the final destination on the DS9 backend
  • 14. 14 © 2019 Deep SEARCH 9 GmbH14 DS9 standard node Distributing DS9 instances ds9App Frontdoor • Webserver Firewall Browserfarm • DS9 • MySQL • Elasticsearch • Blazegraph • DS9 Farming • MySQL • DS9 App • MySQL Frontdoor • Webserver Firewall • DS9 • MySQL • Blazegraph DS9 distributed node Smaller footprint!
  • 15. 15 © 2019 Deep SEARCH 9 GmbH15 Two new types of DS9 jobs were implemented: That‘s what we always did Execute a job from main DS9 host remotely on some other DS9 host for load distribution Execute a job from main DS9 host on a dynamically allocated cluster of EC2 instances that have DS9 Solutions installed Controlled by DS9 Farming URLs read from DS9 main host Results written back to DS9 main host Start 20 nodes in DS9 cluster mode Use t3.xlarge node type (4 VCPUs, 96GB) Run all instances at Amazon in Tokyo DS9 EC2 cloud clusters
  • 16. 16 © 2019 Deep SEARCH 9 GmbH16 ds9App Frontdoor • Webserver Firewall Browserfarm DS9 / IDE DS9 standard installation Instances are dynamically allocated, deployed and started, jobs are executed and at the end all instances are terminated Accounting Dynamic cloud allocation powered by• DS9 • MySQL • Elasticsearch • Blazegraph • DS9 Farming • MySQL • DS9 App • MySQL • DS9 • MySQL • Blazegraph Each node is a full installation of DS9 Solutions (without Elasticsearch) Finally fully scalable (this satisfies our 3rd need) AWS Region Tokyo 20x – deployment takes < 5 minutes
  • 17. 17 © 2019 Deep SEARCH 9 GmbH17 1. export DS9 Farming 2. unpack Claim containers input powered by DS9 Solutions • DS9 • MySQL • Blazegraph DS9 Solutions • DS9 • MySQL • Blazegraph DS9 Solutions • DS9 • MySQL • Blazegraph DS9 Solutions • DS9 • MySQL • Blazegraph 3. start nodes 4. import job 5. execute job remote read equally distribute URLs among EC2 nodes write cache remote write Only move necessary resources to EC2 Execute Distributed Job 6. stop nodes …
  • 18. 18 © 2019 Deep SEARCH 9 GmbH18 Sources Information Scientists SEARCHCORPORA • Start-ups • Competitors • Regulatory • New technology • … Scheduled execution Unattendedupdates Automatic publication • Known (trusted) sources • More complete • Faster Managed Intelligence 2018 • Information source selection • Content structuring • Linking of disparate sources • Ontology management • SEARCHCORPUS® management Search Competence Center Information Consumers Internal Customers Expertise of information scientist needed Unattended automatic execution of jobs Sources Surface Web Deep Web Databases Repositories Prepare semantic search Ontology management
  • 19. 19 © 2019 Deep SEARCH 9 GmbH19 Company repositories e.g. Crunchbase Master SEARCHCORPUS® • Hundreds of thousands of websites • Many Million web pages • PDF-based publications • Structured data • Other sources Extraction using Lucene query + classification SEARCHCORPORA • Biotech • CROs • Digital Therapeutics • Technology Transfer Offices • Clinical trials • Other scopes of information Customize for research target Automatic publication: • target specific focus Information Consumers Internal CustomersQuality assurance Qualification SEARCHCORPUS® Crawling and automatic classification for classes of interest Classified targets Managed Intelligence 2019 Fully distributed Expertise of information scientist needed Crawling Unattended automatic execution of jobs Distributed automatic execution of jobs Information Scientists Search Competence Center Surface / Deep Web
  • 20. 20 © 2019 Deep SEARCH 9 GmbH20 Deep SEARCH 9 Distributing AI to the Amazon cloud IC-SDV 2019 08 - 09 April Nice, France Klaus Kater Deep SEARCH 9 GmbH Managing Partner https://deepsearchnine.com