Enviar pesquisa
Carregar
Analyzing Power of Tweets in Predicting Commodity Futures
•
3 gostaram
•
1,166 visualizações
Srivatsan Ramanujam
Seguir
Extracting signals from tweets to predict commodity futures.
Leia menos
Leia mais
Dados e análise
Vista de apresentação de diapositivos
Denunciar
Compartilhar
Vista de apresentação de diapositivos
Denunciar
Compartilhar
1 de 20
Baixar agora
Baixar para ler offline
Recomendados
All thingspython@pivotal
All thingspython@pivotal
Srivatsan Ramanujam
Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)
Srivatsan Ramanujam
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
Srivatsan Ramanujam
Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action
Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action
EMC
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
Srivatsan Ramanujam
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Srivatsan Ramanujam
Pivotal OSS meetup - MADlib and PivotalR
Pivotal OSS meetup - MADlib and PivotalR
go-pivotal
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Sarah Aerni
Recomendados
All thingspython@pivotal
All thingspython@pivotal
Srivatsan Ramanujam
Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)
Srivatsan Ramanujam
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
Srivatsan Ramanujam
Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action
Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action
EMC
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
Srivatsan Ramanujam
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Srivatsan Ramanujam
Pivotal OSS meetup - MADlib and PivotalR
Pivotal OSS meetup - MADlib and PivotalR
go-pivotal
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Sarah Aerni
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
PivotalOpenSourceHub
Graph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise Graph
TigerGraph
The MADlib Analytics Library
The MADlib Analytics Library
EMC
Open source analytics
Open source analytics
Ajay Ohri
Machine Learning with Hadoop
Machine Learning with Hadoop
Sangchul Song
Machine Learning and Hadoop
Machine Learning and Hadoop
Josh Patterson
Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018
TigerGraph
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
TigerGraph
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
MLconf
An Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBase
Lukas Vlcek
Plume - A Code Property Graph Extraction and Analysis Library
Plume - A Code Property Graph Extraction and Analysis Library
TigerGraph
Graph Gurus Episode 12: Tiger Graph v2.3 Overview
Graph Gurus Episode 12: Tiger Graph v2.3 Overview
TigerGraph
Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016
MLconf
Apache HAWQ and Apache MADlib: Journey to Apache
Apache HAWQ and Apache MADlib: Journey to Apache
PivotalOpenSourceHub
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
MLconf
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
TigerGraph
A sql implementation on the map reduce framework
A sql implementation on the map reduce framework
eldariof
Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Gra...
Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Gra...
TigerGraph
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Databricks
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Big Data Spain
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Esther Vasiete
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Ian Huston
Mais conteúdo relacionado
Mais procurados
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
PivotalOpenSourceHub
Graph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise Graph
TigerGraph
The MADlib Analytics Library
The MADlib Analytics Library
EMC
Open source analytics
Open source analytics
Ajay Ohri
Machine Learning with Hadoop
Machine Learning with Hadoop
Sangchul Song
Machine Learning and Hadoop
Machine Learning and Hadoop
Josh Patterson
Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018
TigerGraph
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
TigerGraph
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
MLconf
An Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBase
Lukas Vlcek
Plume - A Code Property Graph Extraction and Analysis Library
Plume - A Code Property Graph Extraction and Analysis Library
TigerGraph
Graph Gurus Episode 12: Tiger Graph v2.3 Overview
Graph Gurus Episode 12: Tiger Graph v2.3 Overview
TigerGraph
Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016
MLconf
Apache HAWQ and Apache MADlib: Journey to Apache
Apache HAWQ and Apache MADlib: Journey to Apache
PivotalOpenSourceHub
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
MLconf
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
TigerGraph
A sql implementation on the map reduce framework
A sql implementation on the map reduce framework
eldariof
Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Gra...
Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Gra...
TigerGraph
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Databricks
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Big Data Spain
Mais procurados
(20)
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
Graph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise Graph
The MADlib Analytics Library
The MADlib Analytics Library
Open source analytics
Open source analytics
Machine Learning with Hadoop
Machine Learning with Hadoop
Machine Learning and Hadoop
Machine Learning and Hadoop
Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
An Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBase
Plume - A Code Property Graph Extraction and Analysis Library
Plume - A Code Property Graph Extraction and Analysis Library
Graph Gurus Episode 12: Tiger Graph v2.3 Overview
Graph Gurus Episode 12: Tiger Graph v2.3 Overview
Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016
Apache HAWQ and Apache MADlib: Journey to Apache
Apache HAWQ and Apache MADlib: Journey to Apache
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
A sql implementation on the map reduce framework
A sql implementation on the map reduce framework
Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Gra...
Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Gra...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Semelhante a Analyzing Power of Tweets in Predicting Commodity Futures
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Esther Vasiete
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Ian Huston
Predicting Tweet Sentiment
Predicting Tweet Sentiment
Lucinda Linde
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
IRJET Journal
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
TigerGraph
Talk about Hivemall at Data Scientist Organization on 2015/09/17
Talk about Hivemall at Data Scientist Organization on 2015/09/17
Makoto Yui
ChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdf
Sonal Tiwari
Db tech show - hivemall
Db tech show - hivemall
Makoto Yui
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Greg Makowski
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...
PyData
Greenplum Database Open Source December 2015
Greenplum Database Open Source December 2015
PivotalOpenSourceHub
Deep Learning for Recommender Systems
Deep Learning for Recommender Systems
Nick Pentreath
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Scott Mitchell
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
William Markito Oliveira
Introduction To R
Introduction To R
Spotle.ai
What is Chatgpt Complete Guide
What is Chatgpt Complete Guide
Ravendra Singh
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
TigerGraph
Graph Gurus Episode 29: Using Graph Algorithms for Advanced Analytics Part 3
Graph Gurus Episode 29: Using Graph Algorithms for Advanced Analytics Part 3
TigerGraph
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Databricks
Lambda architecture for real time big data
Lambda architecture for real time big data
Trieu Nguyen
Semelhante a Analyzing Power of Tweets in Predicting Commodity Futures
(20)
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Predicting Tweet Sentiment
Predicting Tweet Sentiment
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Talk about Hivemall at Data Scientist Organization on 2015/09/17
Talk about Hivemall at Data Scientist Organization on 2015/09/17
ChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdf
Db tech show - hivemall
Db tech show - hivemall
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...
Greenplum Database Open Source December 2015
Greenplum Database Open Source December 2015
Deep Learning for Recommender Systems
Deep Learning for Recommender Systems
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
Introduction To R
Introduction To R
What is Chatgpt Complete Guide
What is Chatgpt Complete Guide
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
Graph Gurus Episode 29: Using Graph Algorithms for Advanced Analytics Part 3
Graph Gurus Episode 29: Using Graph Algorithms for Advanced Analytics Part 3
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Lambda architecture for real time big data
Lambda architecture for real time big data
Último
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Boston Institute of Analytics
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
aleedritatuxx
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Boston Institute of Analytics
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Thomas Poetter
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
jennyeacort
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
chwongval
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Boston Institute of Analytics
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
Amil Baba Dawood bangali
Learn How Data Science Changes Our World
Learn How Data Science Changes Our World
Eduminds Learning
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
Rafezzaman
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
Seán Kennedy
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
dataanalyticsqueen03
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
voginip
Real-Time AI Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
Timothy Spann
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
thyngster
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
yuu sss
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
📊 Markus Baersch
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
Cathrine Wilhelmsen
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
yuu sss
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
limedy534
Último
(20)
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
Learn How Data Science Changes Our World
Learn How Data Science Changes Our World
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
Real-Time AI Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Analyzing Power of Tweets in Predicting Commodity Futures
1.
Analyzing the power
of Tweets in predicting Commodity Futures Mar 17, 2014 @gopivotal @being_bayesian Srivatsan Ramanujam Senior Data Scientist Pivotal © Copyright 2013 Pivotal. All rights reserved. 1
2.
Problem Definition Ÿ
Can we predict Corn, Soybean and Wheat futures based on Social Chatter on Twitter ? Ÿ The Customer: A major Agricultural Cooperative @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 2
3.
@gopivotal @being_bayesian Data
© Copyright 2013 Pivotal. All rights reserved. 3
4.
Obtaining Data Ÿ
Used to fetch 5-years of historical tweets matching any of a list of keywords of interest Tweets Table Poster Information @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 4
5.
GNIP @gopivotal @being_bayesian
Ÿ As plugged-in partners, we’ve worked with GNIP before, experience was great! Ÿ We needed historical data and GNIP’s Historical PowerTrack came in handy Ÿ Clean API, quick quotes, convenient to download results of historical jobs © Copyright 2013 Pivotal. All rights reserved. 5
6.
Grain Futures Vs.
Volume of Tweets @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 6
7.
The Platform @gopivotal
@being_bayesian © Copyright 2013 Pivotal. All rights reserved. 7
8.
Data Science Toolkit
Ÿ Appliance – Full Rack DCA with Greenplum Database Ÿ ETL – Python Ÿ Modeling – SQL – MADlib – PL/Python, PL/Java – Ark-Tweet-NLP1 with PL/Java Wrappers Ÿ Visualization – Tableau 1CMU ARK Twitter Parts-of-Speech tagger : http://www.ark.cs.cmu.edu/TweetNLP (GPL 2) @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 8
9.
Pivotal Greenplum MPP
DB @gopivotal @being_bayesian Think of it as multiple PostGreSQL servers Master Segments/Workers Rows are distributed across segments by a particular field (or randomly) © Copyright 2013 Pivotal. All rights reserved. 9
10.
PL/X : X
in {pgsql, R, Python, Java, Perl, C etc.} • Allows users to write Greenplum/ PostgreSQL functions in the R/Python/ Java, Perl, pgsql or C languages Standby Ÿ The interpreter/VM of the language ‘X’ is installed on each node of the Greenplum Database Cluster • Data Parallelism: - PL/X piggybacks on Greenplum’s MPP architecture @gopivotal @being_bayesian Master Segment Host Segment Segment … Master Host SQL Interconnect Segment Host Segment Segment Segment Host Segment Segment Segment Host Segment Segment © Copyright 2013 Pivotal. All rights reserved. 10
11.
Scalable, in-database ML
• Open Source!https://github.com/madlib/madlib • Works on Greenplum DB and PostgreSQL • Active development by Pivotal • Downloads and Docs: http://madlib.net/ @gopivotal @being_bayesian - Latest Release : 1.4 (Dec 2014) © Copyright 2013 Pivotal. All rights reserved. 11
12.
MADlib In-Database Functions
Predictive Modeling Library Generalized Linear Models • Linear Regression • Logistic Regression • Multinomial Logistic Regression • Cox Proportional Hazards • Regression • Elastic Net Regularization • Sandwich Estimators (Huber white, clustered, marginal effects) Matrix Factorization • Single Value Decomposition (SVD) • Low-Rank @gopivotal @being_bayesian Machine Learning Algorithms • Principal Component Analysis (PCA) • Association Rules (Affinity Analysis, Market Basket) • Topic Modeling (Parallel LDA) • Decision Trees • Ensemble Learners (Random Forests) • Support Vector Machines • Conditional Random Field (CRF) • Clustering (K-means) • Cross Validation Linear Systems • Sparse and Dense Solvers Descriptive Statistics Sketch-based Estimators • CountMin (Cormode- Muthukrishnan) • FM (Flajolet-Martin) • MFV (Most Frequent Values) Correlation Summary Support Modules Array Operations Sparse Vectors Random Sampling Probability Functions © Copyright 2013 Pivotal. All rights reserved. 12
13.
@gopivotal @being_bayesian The
Models © Copyright 2013 Pivotal. All rights reserved. 13
14.
The Approach •
In addition to identifying textual cues in tweets that were correlated with commodity futures, we also wanted to analyze whether tweet sentiment was correlated with commodity futures @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 14
15.
Sentiment Analysis –
Challenges Ÿ Language on Twitter doesn’t adhere to rules of grammar, syntax or spelling Ÿ We don’t have labeled data for our problem. The tweets aren’t tagged with sentiment Ÿ Semi-Supervised Sentiment Prediction can be achieved by dictionary look-ups of tokens in a Tweet, but without Context, Sentiment Prediction is futile! @gopivotal @being_bayesian “Cool” © Copyright 2013 Pivotal. All rights reserved. 15
16.
Sentiment Analysis –
Approach Ÿ Parallelized ArkTweetNLP to achieve fast parts-of-speech tagging on Tweets Ÿ Custom (patent pending) algorithm to extract contextual cues & score sentiment of tweets Semi-Supervised Sentiment Classification Phrase Extraction Break-up Tweets into tokens and tag their parts-of-speech Part-of-speech tagger1 1: Parts-of-speech Tagger : Gp-Ark-Tweet-NLP (http://vatsan.github.io/gp-ark-tweet-nlp/) @gopivotal @being_bayesian Phrasal Polarity Scoring Use learned phrasal polarities to score sentiment of new tweets Sentiment Scored Tweets © Copyright 2013 Pivotal. All rights reserved. 16
17.
Text Analytics Pipeline
with GNIP stream Tweet Stream Stored on HDFS (gpfdist) Loaded as external tables into GPDB Parallel Parsing of JSON and extraction of fields using PL/ Python @gopivotal @being_bayesian Topic Analysis through MADlib pLDA Sentiment Analysis through custom PL/Python functions D3.js © Copyright 2013 Pivotal. All rights reserved. 17
18.
Key Take-Aways There
is significant signal in Tweets in predicting commodity futures Sentiment Analysis of tweets can provide an additional signal in predicting commodity futures. Twitter sentiment was negatively correlated with commodity futures, in the sample we analyzed A blended model of Text Regression, Sentiment Analysis and Tweet Actor information gave us encouraging results and we believe that when combined with market fundamentals like weather or yield will give better models @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 18
19.
What’s in it
for me? @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 19
20.
Pivotal Open Source
Contributions http://gopivotal.com/pivotal-products/open-source-software • MADlib – In-database parallel ML - https://github.com/madlib/madlib • PyMADlib – Python Wrapper for MADlib - https://github.com/gopivotal/pymadlib • PivotalR – R wrapper for MADlib - https://github.com/madlib-internal/PivotalR • Part-of-speech tagger for Twitter via SQL - http://vatsan.github.io/gp-ark-tweet-nlp/ Questions? @being_bayesian @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 20
Baixar agora