SlideShare a Scribd company logo
1 of 12
Download to read offline
Sentiment analysis of sentences in
Serbian language

Nikola Milošević
Why to analyze sentiment in Serbian?
●

Great industrial need
–
–

Automated market research

–
●

Ads websites
Customer satisfaction

NLP tools for Serbian are not developed
–

Need for tools and resources

–

Almost no accessible tools through API
Serbian language
●

Belongs to Indo-Europian language group

●

Slavic language

●

Highly inflectional

●

3 pronunciation types

●

3 dialect groups

●

Write as you speak

●

Latin and Cyrillic
writing system
Sentiment analysis work-flow
Tokenization and preprocessing
●

Process of breaking a stream of text up into
words

●

Stop-word filtering

●

Negation handling
–
–

●

Adding NE_ prefix after negation
All words before punctuation

Irregular verbs
Stemming
●

Process for reducing inflected words to their
stem, base or root form

●

Kešelj and Šipka (2008)

●

Hand crafted rule based stemmer

●

~300 rules
Sentiment analysis
●

Aim to build binary sentiment analysis

●

General Serbian language

●

No annotated corpus for Serbian

●

Annotation work (~1000 small texts)

●

Supervised machine learning
Naive Bayes
●

Algorithm that learns fast

●

Bag of words approach

●

Assumption of conditional independence

●

Laplace smoothing
Implementation
●

Web API with presentation layer

●

JSON communication

●

Secured page for annotating

●

Build using PHP and MySQL

●

Web & Android
Results
●

Stemmer
–
–

90% correct on news articles

–

●

Smallest and most precise stemmer
Problems: small words, irregular inflections,
voice changes

Sentiment analyzer
–

80% correct

–

Problems: Irony, ambiguity, small training
data
Future work
●

Stemmer
–
–

●

Use snowball framework
Build multi-step stemmer

Sentiment analyzer
–

POS tagging

–

Complex negation handling

–

SVM algorithm
Thank you

●

Available from http://inspiratron.org

●

Contact: nikola.milosevic@postgrad.manchester.ac.uk

More Related Content

More from Nikola Milosevic

More from Nikola Milosevic (20)

Classifying intangible social innovation concepts using machine learning and ...
Classifying intangible social innovation concepts using machine learning and ...Classifying intangible social innovation concepts using machine learning and ...
Classifying intangible social innovation concepts using machine learning and ...
 
Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)
 
Veštačka inteligencija
Veštačka inteligencijaVeštačka inteligencija
Veštačka inteligencija
 
AI an the future of society
AI an the future of societyAI an the future of society
AI an the future of society
 
Machine learning prediction of stock markets
Machine learning prediction of stock marketsMachine learning prediction of stock markets
Machine learning prediction of stock markets
 
Equity forecast: Predicting long term stock market prices using machine learning
Equity forecast: Predicting long term stock market prices using machine learningEquity forecast: Predicting long term stock market prices using machine learning
Equity forecast: Predicting long term stock market prices using machine learning
 
BelBi2016 presentation: Hybrid methodology for information extraction from ta...
BelBi2016 presentation: Hybrid methodology for information extraction from ta...BelBi2016 presentation: Hybrid methodology for information extraction from ta...
BelBi2016 presentation: Hybrid methodology for information extraction from ta...
 
Extracting patient data from tables in clinical literature
Extracting patient data from tables in clinical literatureExtracting patient data from tables in clinical literature
Extracting patient data from tables in clinical literature
 
Supporting clinical trial data curation and integration with table mining
Supporting clinical trial data curation and integration with table miningSupporting clinical trial data curation and integration with table mining
Supporting clinical trial data curation and integration with table mining
 
Mobile security, OWASP Mobile Top 10, OWASP Seraphimdroid
Mobile security, OWASP Mobile Top 10, OWASP SeraphimdroidMobile security, OWASP Mobile Top 10, OWASP Seraphimdroid
Mobile security, OWASP Mobile Top 10, OWASP Seraphimdroid
 
Serbia2
Serbia2Serbia2
Serbia2
 
Table mining and data curation from biomedical literature
Table mining and data curation from biomedical literatureTable mining and data curation from biomedical literature
Table mining and data curation from biomedical literature
 
Malware
MalwareMalware
Malware
 
Http and security
Http and securityHttp and security
Http and security
 
Android business models
Android business modelsAndroid business models
Android business models
 
Android(1)
Android(1)Android(1)
Android(1)
 
Sigurnosne prijetnje i mjere zaštite IT infrastrukture
Sigurnosne prijetnje i mjere zaštite IT infrastrukture Sigurnosne prijetnje i mjere zaštite IT infrastrukture
Sigurnosne prijetnje i mjere zaštite IT infrastrukture
 
Mašinska analiza sentimenta rečenica na srpskom jeziku
Mašinska analiza sentimenta rečenica na srpskom jezikuMašinska analiza sentimenta rečenica na srpskom jeziku
Mašinska analiza sentimenta rečenica na srpskom jeziku
 
Malware
MalwareMalware
Malware
 
Software Freedom day Serbia - Owasp - informaciona bezbednost u Srbiji open s...
Software Freedom day Serbia - Owasp - informaciona bezbednost u Srbiji open s...Software Freedom day Serbia - Owasp - informaciona bezbednost u Srbiji open s...
Software Freedom day Serbia - Owasp - informaciona bezbednost u Srbiji open s...
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Sentiment analysis for Serbian language

  • 1. Sentiment analysis of sentences in Serbian language Nikola Milošević
  • 2. Why to analyze sentiment in Serbian? ● Great industrial need – – Automated market research – ● Ads websites Customer satisfaction NLP tools for Serbian are not developed – Need for tools and resources – Almost no accessible tools through API
  • 3. Serbian language ● Belongs to Indo-Europian language group ● Slavic language ● Highly inflectional ● 3 pronunciation types ● 3 dialect groups ● Write as you speak ● Latin and Cyrillic writing system
  • 5. Tokenization and preprocessing ● Process of breaking a stream of text up into words ● Stop-word filtering ● Negation handling – – ● Adding NE_ prefix after negation All words before punctuation Irregular verbs
  • 6. Stemming ● Process for reducing inflected words to their stem, base or root form ● Kešelj and Šipka (2008) ● Hand crafted rule based stemmer ● ~300 rules
  • 7. Sentiment analysis ● Aim to build binary sentiment analysis ● General Serbian language ● No annotated corpus for Serbian ● Annotation work (~1000 small texts) ● Supervised machine learning
  • 8. Naive Bayes ● Algorithm that learns fast ● Bag of words approach ● Assumption of conditional independence ● Laplace smoothing
  • 9. Implementation ● Web API with presentation layer ● JSON communication ● Secured page for annotating ● Build using PHP and MySQL ● Web & Android
  • 10. Results ● Stemmer – – 90% correct on news articles – ● Smallest and most precise stemmer Problems: small words, irregular inflections, voice changes Sentiment analyzer – 80% correct – Problems: Irony, ambiguity, small training data
  • 11. Future work ● Stemmer – – ● Use snowball framework Build multi-step stemmer Sentiment analyzer – POS tagging – Complex negation handling – SVM algorithm
  • 12. Thank you ● Available from http://inspiratron.org ● Contact: nikola.milosevic@postgrad.manchester.ac.uk