SlideShare uma empresa Scribd logo
1 de 88
Baixar para ler offline
Text classification
With Apache Mahout and Lucene
Isabel Drost-Fromm

Software Engineer at Nokia Maps*
Member of the Apache Software Foundation
Co-Founder of Berlin Buzzwords and
Berlin Apache Hadoop GetTogether
Co-founder of Apache Mahout

*We are hiring, talk to me or mail careers@here.com
TM
https://cwiki.apache.org/confluence/display/MAHOUT/Powered+By+Mahout

… provide your own success story online.
TM
Classification?
January 8, 2008 by Pink Sherbet Photography
http://www.flickr.com/photos/pinksherbet/2177961471/
By freezelight, http://www.flickr.com/photos/63056612@N00/155554663/
http://www.flickr.com/photos/29143375@N05/3344809375/in/photostream/

http://www.flickr.com/photos/redux/409356158/
http://www.flickr.com/photos/29143375@N05/3344809375/in/photostream/

http://www.flickr.com/photos/redux/409356158/
Image by jasondevilla
http://www.flickr.com/photos/jasondv/91960897/
How a linear classifier sees data
Image by ZapTheDingbat (Light meter)
http://www.flickr.com/photos/zapthedingbat/3028168415
Instance*
(sometimes also called example, item, or in databases a row)
Feature*
(sometimes also called attribute, signal, predictor, co-variate, or column in databases)
Label*
(sometimes also called class, target variable)
Image taken in Lisbon/ Portugal.
Image by jasondevilla
http://www.flickr.com/photos/jasondv/91960897/
●

Remove noise.
●

Remove noise.

●

Convert text to vectors.
Text consists of terms and phrases.
Encoding issues?
Chinese? Japanese?
“New York” vs. new York?
“go” vs. “going” vs. “went” vs. “gone”?
“go” vs. “Go”?
Terms? Tokens? Wait!
Now we have terms – how to turn them
into vectors?
If we looked at two phrases only:
Sunny weather

High performance computing
Aaron

Zuse
Binary bag of words
●

Imagine a n-dimensional space.

●

Each dimension = one possible word in texts.

●

Entry in vector is one, if word occurs in text.

●

Problem:
–

bi , j =

{

1 ∀ x i ∈d j
0 else

}

How to know all possible terms in unknown text?
Term Frequency
●

Imagine a n-dimensional space.

●

Each dimension = one possible word in texts.

●

Entry in vector equal to the words frequency.
bi , j =ni , j

●

Problem:
–

Common words dominate vectors.
TF with stop wording
●

Imagine a n-dimensional space.

●

Each dimension = one possible word in texts.

●

Filter stopwords.

●

Entry in vector equal to the words frequency.

●

Problem:
–

bi , j =ni , j

Common and uncommon words with same weight.
TF- IDF
●

Imagine a n-dimensional space.

●

Each dimension = one possible word in texts.

●

Filter stopwords.

●

Entry in vector equal to the weighted frequency.

●

Problem:
–

bi , j =ni , j ×log 

∣D∣

∣{ d : t i ∈d }∣

Long texts get larger values.
Hashed feature vectors
●

Imagine a n-dimensional space.

●

Each word in texts = hashed to one dimension.

●

Entry in vector set to one, if word hashed to it.
<
How a linear classifier sees data
HTML

Tokenstream+x

Apache Tika

FeatureVector
Encoder

Fulltext

Lucene
Analyzer

Vector

Online
Learner

Model
Image by ZapTheDingbat (Light meter)
http://www.flickr.com/photos/zapthedingbat/3028168415
Goals

●

Did I use the best model parameters?

●

How well will my model perform in the wild?
Tune model
Parameters,
Experiment with
Tokenization,
Experiment with
Vector Encoding

Compute expected
performance
Performance
●

Use same data for training and testing.

●

Problem:
–

Highly optimistic.

–

Model generalization unknown.
Performance
●

Use same data for training and testing.

DON'T
●

Problem:
–

Highly optimistic.

–

Model generalization unknown.
Performance
●

Use just a fraction for training.

●

Set some data aside for testing.

●

Problems:
–

Pessimistic predictor: Not all data used for training.

–

Result may depend on which data was set aside.
Performance
●

Partition your data into n fractions.

●

Each fraction set aside for testing in turn.

●

Problem:
–

Still a pessimistic predictor.
Performance
●

Use just a fraction for training.

●

Set some data aside for tuning and testing.

●

Problems:
–

Highly optimistic.

–

Parameters manually tuned to testing data.
Performance
●

Use just a fraction for training.

●

Set some data aside for tuning and testing.
DON'T

●

Problems:
–

Highly optimistic.

–

Parameters manually tuned to testing data.
Performance
●

Use just a fraction for training.

●

Set some data aside for tuning.

●

Set another set of data aside for testing.

●

Problems:
–

Pretty pessimistic as not all data is used.

–

May depend on which data was set aside.
Performance Measures
Correct prediction: negative

Model
prediction:
negative

Model
prediction:
positive

Correct prediction: positive
Accuracy
ACC=

●

true positivetrue negative
true positive false positive false negativetrue negative

Problems:
–

What if class distribution is skewed?
Precision/ Recall
true positive
Precision=
true positive false positive
true positive
Recall=
true positive false negative
●

Problem:
–

Depends on decision threshold.
ROC Curves
ROC Curves

Orange rate
ROC Curves
True orange rate

False orange rate
ROC Curves
True orange rate

False orange rate
ROC Curves
True orange rate

False orange rate
ROC Curves
True orange rate

False orange rate
ROC Curves
True orange rate

False orange rate
AUC – area under ROC
True orange rate

False orange rate
Foto taken by fras1977
http://www.flickr.com/photos/fras/4992313333/
Image by Medienmagazin pro
http://www.flickr.com/photos/medienmagazinpro/6266643422
http://www.flickr.com/photos/generated/943078008/
Apache Hadoop-ready
Recommendations/
Collaborative filtering

kNN and matrix factorization
based Collaborative filtering
Classification/
Naïve Bayes, random forest
Frequent item sets/
(P)FPGrowth

Classification/
Logistic Regression/ SGD

Clustering/ Mean shift, k-Means,
Canopy, Dirichlet Process,
Co-Location search

Sequence learning/
HMM

Math libs/ Mahout collections

LDA
Libraries to have a look at:
Vowpal Wabbit Mallet
LibSvm
LibLinear
Libfm
Incanter
GraphLab
Skikits learn

Where to get more information:
“Mahout in Action” - Manning
“Taming Text” - Manning
“Machine Learning” - Andrew Ng
https://cwiki.apache.org/confluence/dis
play/MAHOUT/Books+Tutorials+and+T
alks
https://cwiki.apache.org/confluence/dis
play/MAHOUT/Reference+Reading
Image by pareeerica
http://www.flickr.com/photos/pareeerica/3711741298/

Frameworks worth mentioning:
Apache Mahout
Matlab/ Otave
Shogun
RapidI

Apache Giraph
R
Weka
MyMedialight

Get your hands dirty:
http://kaggle.com
https://cwiki.apache.org/confluence/dis
play/MAHOUT/Collections

Where to meet these people:
RecSys
NIPS
KDD
PKDD
ApacheCon
O'Reilly Strata

ICML
ECML
WSDM
JMLR
Berlin Buzzwords
Get started today with the right tools.

January 8, 2008 by dreizehn28
http://www.flickr.com/photos/1328/2176949559
Discuss ideas and problems online.

November 16, 2005 [phil h]
http://www.flickr.com/photos/hi-phi/64055296
Images taken at Berlin Buzzwords 2011/12/13 by
Philipp Kaden. See you there end of May 2014.

Discuss ideas and problems in person.
Become a committer yourself
BerlinBuzzwords.de – End of May 2014 in Berlin/ Germany.

http://

Online – user/dev@mahout.apache.org, java-user@lucene.apache.org,
dev@lucene.apache.org

Interest in solving hard problems.
Being part of lively community.
Engineering best practices.

Bug reports, patches, features.
Documentation, code, examples.
Image by: Patrick McEvoy
http://www.flickr.com/photos/29143375@N05/3344809375/in/photostream/

http://www.flickr.com/photos/redux/409356158/
http://www.flickr.com/photos/29143375@N05/3344809375/in/photostream/

http://www.flickr.com/photos/redux/409356158/
By freezelight, http://www.flickr.com/photos/63056612@N00/155554663/

Mais conteúdo relacionado

Mais procurados

Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and Solr
Tommaso Teofili
 

Mais procurados (20)

MaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overview
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and Solr
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in Python
 
L06 stemmer and edit distance
L06 stemmer and edit distanceL06 stemmer and edit distance
L06 stemmer and edit distance
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learn
 
Generating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaGenerating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in julia
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basics
 
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Deep Learning For Practitioners,  lecture 2: Selecting the right applications...Deep Learning For Practitioners,  lecture 2: Selecting the right applications...
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
 
A Multiscale Visualization of Attention in the Transformer Model
A Multiscale Visualization of Attention in the Transformer ModelA Multiscale Visualization of Attention in the Transformer Model
A Multiscale Visualization of Attention in the Transformer Model
 
Introduction of Feature Hashing
Introduction of Feature HashingIntroduction of Feature Hashing
Introduction of Feature Hashing
 
Aaa ped-23-Artificial Neural Network: Keras and Tensorfow
Aaa ped-23-Artificial Neural Network: Keras and TensorfowAaa ped-23-Artificial Neural Network: Keras and Tensorfow
Aaa ped-23-Artificial Neural Network: Keras and Tensorfow
 
Brief introduction to Machine Learning
Brief introduction to Machine LearningBrief introduction to Machine Learning
Brief introduction to Machine Learning
 
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
 
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
 
PFN Spring Internship Final Report: Autonomous Drive by Deep RL
PFN Spring Internship Final Report: Autonomous Drive by Deep RLPFN Spring Internship Final Report: Autonomous Drive by Deep RL
PFN Spring Internship Final Report: Autonomous Drive by Deep RL
 
Intro to Data Structure & Algorithms
Intro to Data Structure & AlgorithmsIntro to Data Structure & Algorithms
Intro to Data Structure & Algorithms
 

Destaque

E-learning 2.0: Nuevas oportunidades para aprender en red
E-learning 2.0: Nuevas oportunidades para aprender en redE-learning 2.0: Nuevas oportunidades para aprender en red
E-learning 2.0: Nuevas oportunidades para aprender en red
David Delgado ✔
 
Travel digital iq 2011
Travel digital iq 2011Travel digital iq 2011
Travel digital iq 2011
Gabriela Otto
 
Sappres Netweaver Identity Management
Sappres Netweaver Identity ManagementSappres Netweaver Identity Management
Sappres Netweaver Identity Management
gueste2a899
 
Macsfs apologetica i el rapto
Macsfs apologetica i el raptoMacsfs apologetica i el rapto
Macsfs apologetica i el rapto
defiendetufe
 
Presentaciones de apuntes de integración cad cam
Presentaciones de apuntes de integración cad camPresentaciones de apuntes de integración cad cam
Presentaciones de apuntes de integración cad cam
epnmecanica
 
Tabelaprecosee201201
Tabelaprecosee201201Tabelaprecosee201201
Tabelaprecosee201201
Miguel Silva
 

Destaque (20)

Natural Language Processing using Java
Natural Language Processing using JavaNatural Language Processing using Java
Natural Language Processing using Java
 
Classificação de documentos
Classificação de documentosClassificação de documentos
Classificação de documentos
 
Krellenstein lucene revolution_2011_keynote_once_future_history_enterprise se...
Krellenstein lucene revolution_2011_keynote_once_future_history_enterprise se...Krellenstein lucene revolution_2011_keynote_once_future_history_enterprise se...
Krellenstein lucene revolution_2011_keynote_once_future_history_enterprise se...
 
Enhance discovery Solr and Mahout
Enhance discovery Solr and MahoutEnhance discovery Solr and Mahout
Enhance discovery Solr and Mahout
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!
 
Presentación productos VIP Amado Salvador
Presentación productos VIP Amado SalvadorPresentación productos VIP Amado Salvador
Presentación productos VIP Amado Salvador
 
Plan de marketig
Plan de marketigPlan de marketig
Plan de marketig
 
Escuela de conducción para conductores profesionales
Escuela de conducción para conductores profesionalesEscuela de conducción para conductores profesionales
Escuela de conducción para conductores profesionales
 
Mon youth bulletin vol 28
Mon youth bulletin vol 28Mon youth bulletin vol 28
Mon youth bulletin vol 28
 
E-learning 2.0: Nuevas oportunidades para aprender en red
E-learning 2.0: Nuevas oportunidades para aprender en redE-learning 2.0: Nuevas oportunidades para aprender en red
E-learning 2.0: Nuevas oportunidades para aprender en red
 
Travel digital iq 2011
Travel digital iq 2011Travel digital iq 2011
Travel digital iq 2011
 
Cocktail PGI Open source fait par et pour le Secteur public
Cocktail PGI Open source fait par et pour le Secteur publicCocktail PGI Open source fait par et pour le Secteur public
Cocktail PGI Open source fait par et pour le Secteur public
 
Practica #4 ph de la leche
Practica #4 ph de la lechePractica #4 ph de la leche
Practica #4 ph de la leche
 
Sappres Netweaver Identity Management
Sappres Netweaver Identity ManagementSappres Netweaver Identity Management
Sappres Netweaver Identity Management
 
Macsfs apologetica i el rapto
Macsfs apologetica i el raptoMacsfs apologetica i el rapto
Macsfs apologetica i el rapto
 
Beef framework 2016
Beef framework 2016Beef framework 2016
Beef framework 2016
 
Diapositivas rosadas regimennn
Diapositivas rosadas regimennnDiapositivas rosadas regimennn
Diapositivas rosadas regimennn
 
Presentaciones de apuntes de integración cad cam
Presentaciones de apuntes de integración cad camPresentaciones de apuntes de integración cad cam
Presentaciones de apuntes de integración cad cam
 
TRabajo de la voz y sonido
TRabajo de la voz y sonidoTRabajo de la voz y sonido
TRabajo de la voz y sonido
 
Tabelaprecosee201201
Tabelaprecosee201201Tabelaprecosee201201
Tabelaprecosee201201
 

Semelhante a Text Classification Powered by Apache Mahout and Lucene

Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit
Antti Haapala
 

Semelhante a Text Classification Powered by Apache Mahout and Lucene (20)

Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learning
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit
 
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f..."Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
 
Recommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetRecommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right Dataset
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
 
1025 track1 Malin
1025 track1 Malin1025 track1 Malin
1025 track1 Malin
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
Predicting Tweet Sentiment
Predicting Tweet SentimentPredicting Tweet Sentiment
Predicting Tweet Sentiment
 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
 
BRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning TalkBRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning Talk
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image Processing
 
Predictive analytics semi-supervised learning with GANs
Predictive analytics   semi-supervised learning with GANsPredictive analytics   semi-supervised learning with GANs
Predictive analytics semi-supervised learning with GANs
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & Metrics
 

Mais de lucenerevolution

Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
lucenerevolution
 

Mais de lucenerevolution (20)

State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Text Classification Powered by Apache Mahout and Lucene