SlideShare uma empresa Scribd logo
1 de 52
Sparking Social Big Data
linkedin.com/in/sureshsood
@soody
Handouts & Reference Materials
1.NIST Big Data Interoperability Volume 1 Definitions Final Version 1 9/ 2015
2. Field Guide to Hadoop (preview edition)
3. Learning Spark preview edition
4. Databricks Spark Reference Applications
5. Spark Data Analytics projects/users
Areas for Conversation
Social (content, structure and analytics)
Data Science Primer and Resources:Big data, Spark ecosystem
Data Science Innovation
Roadmap – Evolution from Existing Operations to Predictive
4
Rigid Flexible Connected
What if conversations continue?
(Adapted from Solis, 2012 and Davenport 2007)
Themes
Silo, rigid
Hoarding info
Vs. collaboration
Freely share info and
Knowledge on internal basis
acting social with customers
2 –way communications
Connected internal and
External. Listening and
Learning. Internal and
external engagement
Shared via hub and
Spoke. Employees
Connected directly to
Customers.
Adaptive
Agile, integrate customer
Experiences and feedback
Loops. Listening and
Learning now become
analyse and insights
Makes sense of data
And transforms into
Intelligence.
Respond in Real time
Predictive
Shift from reactive to
Proactive and predictive
Business uses social
media heavily and is
flexible, connected,
adaptive and predictive in
terms of customer
experiences,
needs and new
opportunities. Predict
scenarios before they
occur maximise
opportunity and limit risk
How can we lead conversations?
(predictive recommendation)
What conversations are next?
Why are these conversations occurring?
What actions are required?
What are the sentiment of conversations?
When and where are conversations taking place?
What conversations are taking place?
Business
Intelligence
Types of Intentionality
Social CRM integrates social data
6
Aquarius,Aries,Cancer,Capricorn,Gemini,Leo,Libra,
Pisces, Sagittarius,Scorpio,Taurus,Virgo
Ambivalent, Employee, Opposer, Reporter, Supporter
11. Committed Partnerships, 12. Compartmentalised Friendship,13.
Childhood friendship,14. Courtship,15. Fling, 16. Secret-Affair, 17.
Enslavement , 2. Marriages of Convenience,3. Best Friendships,4.
Kinships, 5. Rebounds/ Avoidance-Driven,6. Courtships,7.Dependencies
8. Enmities, 9. Love-Hate (Sweeney and Chew)
Africa,Argentina,Australia,Australia/Hong Kong, Austria,
California, Canada, China, Egypt, England, Finland, France
Germany, Guernsey, Holland, India, Indonesia, Ireland ,
Israel, Italy , Japan, Kuwait, Malaysia, Nepal,Paraguay ,
Philippines, Phillipines, Portugual, Saudi Arabia, Singapore
South Africa, Spain, Sweden, Taiwan, Thailand,UK ,USA
A&F,Beijing ,Gucci,LVMH,New York,Old Navy,
,Paris, Sydney, Tiffany, Tokyo, Tommy, Versace
An-Verb,An-Vis,Hol-Verb,Hol-Vis
Depriv/Enhance,Enhance/Depriv
Data Mining/ Rattle Workflow
11
Exploring Variable Distributions (Training Data Set)
Data Visualisation of Variables (Training Data Set)
Item Frequency In Support of Association Rules
Display of Decision Tree for Brand as Target Variable
Model Comparison By Variables/Predictors
• Australian Pioneer Dr John Galloway (AM)
• 1990’s Ivan Milat killed 7 backpackers making him Australia's most notorious Serial Killer
• Everyone in Australia was a suspect
• Large volumes of data from multiple sources
 RTA Vehicle records
 Gym Memberships
 Gun Licensing records
 Internal Police records
• Police applied node link analysis techniques (NetMap) to the data
• Harness power of the human mind
• Analyst can spot indirect links, patterns , structure, relationships and anomalies
• A bottom-up approach with process of discovery to uncover structure
• Reduced the suspect list from 18 million to 230
• Further analysis with the use of additional satellite information reduced this to 32
Node Link Analytics
Data Information Knowledge
Key Network Measures
• Degree Centrality
• Betweenness Centrality
• Closeness Centrality
• Eigenvector Centrality
krackkite.##h (modified labels)
Connector
(hub)
Diana’s
Clique
Broker
Boundary spanners
Contractor ? Vendor
Collect data
Analyse data
&
find patterns
Theory formulation
Test Theory
inductive reasoning
21
22
Language on Twitter Tracks Rates of Coronary Heart
Disease, Psychological Science, January 2015
23
The findings show that expressions of negative emotions such as anger, stress, and fatigue in the tweets from peo
The results suggest that using Twitter as a window into a community’s collective mental state may provide a usefu
Twitter and Marketing Predictions
• Tweets is “found data” without asking questions
• More meaning than typical search engine query
• Large numbers of passive participants in natural settings
• Twitter can predict the stock market (Lisa Grossman, Wired, Oct 19 2010)
• Predict movie success in first few weekends of release
– “…it also raises an interesting new question for advertisers and marketing executives. Can they change the
demand for their film, product or service buy directly influencing the rate at which people tweet about it?
In other words, can they change the future that tweeters predict?”
Tech Review, http://www.technologyreview.com/blog/arxiv/25000/
24
Psychological analytics helps put human context into Business
• Behavior data  Links human emotions to business -> Analyse footprints left behind.
• What really does customer satisfaction mean ? Is the person actually happy?
• How do we take the emotional dimension into account for customer experience?
• How do we recognize someone is dissatisfied?
• How do we recognize a “distressed” person?
• Do we use text and voice? Will sleeping patterns and eating habits help?
• Would you act differently if someone is happy?
• How do you coach employees to see how someone sounds in emotional terms?
• Understanding when distress exists and when a customer needs enhanced service
• Behaviour data reveals attitude and intent. This is more predictive of future opportunities and
risk versus historical data
26
The Newman Model of Deception (Pennebaker et al)
Key word categories for deception mapping:
(1) Self words e.g. “I” and “me” – decrease when someone distances themselves from content
(2) Exclusive words e.g. “but” and “or” decrease with fabricated content owing to complexity of maintaining
deception
(3) Negative emotion words e.g. “hate” increase in word usage owing to shame or guilty feeling
(4) Motion verbs e.g. “go” or “move” increase as exclusive words go down to keep the story on track
Instagram Deception (Suspects outside of -20 & +20)
Vine Deception (Suspects outside of -5 and +5)
Variety of Data Types & Big Data Challenge
1. Astronomical
2. Documents
3. Earthquake
4. Email
5. Environmental sensors
6. Fingerprints
7. Health (personal) Images
8. Graph data (social network)
9. Location
10.Marine
11.Particle accelerator
12.Satellite
13.Scanned survey data
14.Sound
15.Text
16.Transactions
17.Video
Big Data consists of extensive datasets primarily in the characteristics of
volume, variety, velocity, and/or variability that require a scalable
architecture for efficient storage, manipulation, and analysis.
. Computational portability is the movement of the computation to the location of the data.
Statistics, Data Mining or Data Science ?
• Statistics
– precise deterministic causal analysis over precisely collected data
• Data Mining
– deterministic causal analysis over re-purposed data carefully sampled
• Data Science
– trending/correlation analysis over existing data using bulk of population i.e. big data
– Extraction of actionable knowledge directly from data through a process of discovery,
hypothesis, and hypothesis testing.
Adapted from: NIST Big Data taxonomy draft report :
(see http://bigdatawg.nist.gov /show_InputDoc.php)
Data Science Workflows & Discovery
HadoopConfigurations(SingleandMulti-Rack)
Adapted from: http://stackiq.com/
Cluster manager e.g. Apache Ambari, Apache Mesos, or Rocks
3 TB drives ,18 data nodes
configuration represents 648
TB of raw storage HDFS
standard replication factor of
3
216 TB of usable storage
Name/secondary/data nodes – 6 core 96 GB
Management node – 4 core 16 GB
Spark - High Level Abstraction (version 1.4.0)
34
Full-text search of Wikipedia in <1 sec (vs 20 sec for on-disk data)
Spark Core
SparkR
35
Powered By Spark
35
Berkeley Data Analytics Stack (BDAS)
AMPCrowd: RESTful web service for sending tasks to human workers on crowd platforms .
Used by sampleclean.org - Data Cleaning With Algorithms, Machines, and People
Data Science Innovation
Data science innovation is something an
organization has not done before or even
something nobody anywhere has done before. A
data science innovation focuses on discovering
and using new or untraditional data sources to
solve new problems.
Adapted from:
Franks, B. (2012) Taming the Big Data Tidal Wave, p. 255, John Wiley & Son
http://tacocopter.com/
New Sources of Information (Big data) : Social Media + Internet of Things
 Accounting Analytic Innovations
7,919 40,204
2,003,254,102 51
Gridded Data Sources
http://smap.jpl.nasa.gov/
The ANZ Heavy Traffic Index comprises
flows of vehicles weighing more than 3.5
tonnes (primarily trucks) on 11 selected
roads around NZ. It is contemporaneous
with GDP growth.
The ANZ Light Traffic Index is made up of
light or total traffic flows (primarily cars and
vans) on 10 selected roads around the
country. It gives a six month lead on GDP
growth in normal circumstances (but
cannot predict sudden adverse events such
as the Global Financial Crisis).
http://www.a http://www.anz.co.nz/about-us/economic-markets-research/truckometer/
ANZ TRUCKOMETER
Oil reserves shipment monitoring
Ras Tanura Najmah compound, Saudi Arabia
Source: http://www.skyboximaging.com/blog/monitoring-oil-reserves-from-space
The following BigQuery query (note that the wildcard on "TAX_WEAPONS_SUICIDE_" catches suicide vests, suicide bombers, suicide bombings,
suicide jackets, and so on):
SELECT DATE, DocumentIdentifier, SourceCommonName, V2Themes, V2Locations, V2Tone, SharingImage, TranslationInfo FROM [gdeltv2.gkg] where
(V2Themes like '%TAX_TERROR_GROUP_ISLAMIC_STATE%' or V2Themes like '%TAX_TERROR_GROUP_ISIL%' or V2Themes like
'%TAX_TERROR_GROUP_ISIS%' or V2Themes like '%TAX_TERROR_GROUP_DAASH%') and (V2Themes like '%TERROR%TERROR%' or V2Themes like
'%SUICIDE_ATTACK%' or V2Themes like '%TAX_WEAPONS_SUICIDE_%')
The GDELT Project pushes the boundaries of “big data,” weighing in at over a quarter-billion rows with 59 fields for each record,
spanning the geography of the entire planet, and covering a time horizon of more than 35 years. The GDELT Project is the largest
open-access database on human society in existence. Its archives contain nearly 400M latitude/longitude geographic coordinates
spanning over 12,900 days, making it one of the largest open-access spatio-temporal datasets as well.
GDELT + BigQuery = Query The Planet
MEMEX
42
Source : http://www.scientificamerican.com/slideshow/scientific-american-exclusive-darpa-memex-data-maps/
Also see, http://humantraffickingcenter.org/posts-by-htc-associates/memex-helps-find-human-trafficking-cases-online/
MEMEX - Human Trafficking Analytics
• Human traffickers coercive victims into sex work or low cost labour appearing in adverts online
• Adverts contain embedded data on name of worker, contact info, physical characteristics,
services offered, location, price/pay rates, and other attributes. Useful data but not accessible via
SQL or R.
• DeepDive converts “raw set of advertisements into a single clean structured database table”
• 30 million advertisements obtained for sex work from online
• Trafficking analytic signals
✴ Traffickers move victims from place to place to keep them isolated and easier to control.
Detect individuals in the advertisement data who post multiple advertisements from different
physical locations
✴ Non-trafficked sex workers exhibit economic rationality charge as much as possible for
services, and avoid engaging in risky services. Charging non-market rates or engaging in risky
services
✴ Traffickers may have multiple victims simultaneously. If the contact information for multiple
workers across multiple advertisements contains consecutive phone numbers, it might
suggest one individual purchased several phones at one time.
43
Source : http://www.scientificamerican.com/slideshow/scientific-american-exclusive-darpa-memex-data-maps/
Also see, http://humantraffickingcenter.org/posts-by-htc-associates/memex-helps-find-human-trafficking-cases-online/
DeepDive Data Extraction and Dataset Generation
• URL where the advertisement was found
• Phone number of the person in the advertisement
• Name of the person in the advertisement
• Location where the person offers services
• Rates for services offered
44
Internet of Things “trillion sensors”
Source: www.tsensorssummit.org
3. Black Box Insurance
• Big data transforms actuarial insurance from using probability methods to estimate premiums into dynamic risk management using real data generating individually tailored premiums
• Estimate 20 km work or home journey, data point acquired every min and journey captures 12 points per km. Assume 1000 km per month driving or generating 12,000 points per
month resulting in 144,000 points per car/annum. Hence, 1,000 cars leads to 144 million points per annum.
• Telematics technology (black box) monitor helps assess the driving behavior and prices policy based on true driver centric premiums by capturing:
– Number of journeys
– Distances travelled
– Types of roads
– Speed
– Time of travel
– Acceleration and braking
– Any accidents
– Location ?
• Benefits low mileage, smooth and safe drivers
• Privacy vs. Saving monies on insurance (Canada ; http://bit.ly/Black_box)
Smart Sandbag System
smart-dove.com
The first 3 columns are x, y, z axis of gyroscope, then x, y, z
axis of accelerator. These are raw data of 40 repetitions of
shoulder press exercise. Standard Deviation and moving
average algorithm to build the chart and Hidden Markov
Model to extract features and build model of exercise. All
models are put into cloud for trainee exercise scoring.
6. Supermarket Shopper Behavior
Beacon
Active Card
• The data collected in a single day take nearly two million years to playback on an MP3 player
• Generates enough raw data to fill 15 million 64GB iPods every day
• The central computer has processing power of about one hundred million PCs
• Uses enough optical fiber linking up all the radio telescopes to wrap twice around the Earth
• The dishes when fully operational will produce 10 times the global internet traffic as of 2013
• The supercomputer will perform 1018 operations per second - equivalent to the number of stars in three million Milky
Way galaxies - in order to process all the data produced.
• Sensitivity to detect an airport radar on a planet 50 light years away.
• Thousands of antennas with a combined collecting area of 1,000,000 square meters - 1 sqkm)
• Previous mapping of Centaurus A galaxy took a team 12,000 hours of observations and several years - SKA ETA 5
minutes !
To the scientists involved, however, the SKA is no testbed, it’s a transformative instrument which,
according to Luijten, will lead to “fundamental discoveries of how life and planets and matter all came
into existence. As a scientist, this is a once in a lifetime opportunity.”
Sources: http://bit.ly/amazin-facts & http://bit.ly/astro-ska
Galileo
Square Kilometer Array Construction
(SKA1 - 2018-23; SKA2 - 2023-30)
Centaurus A
Data Science Resources
52
The future is impossible to predict.
However one thing is certain :
The company that can excite it’s customers
dreams is out ahead in the race to business success
Selling Dreams, Gian Luigi Longinotti

Mais conteúdo relacionado

Mais procurados

Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Personalized News and Video Recomendation System at LinkSure
Personalized News and Video Recomendation System at LinkSurePersonalized News and Video Recomendation System at LinkSure
Personalized News and Video Recomendation System at LinkSureLeanne Hwee
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementationSandip Tipayle Patil
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science suresh sood
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Transforming a Business Through Analytics
Transforming a Business Through AnalyticsTransforming a Business Through Analytics
Transforming a Business Through AnalyticsSrinath Perera
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Yaman Hajja, Ph.D.
 
Approaching Big Data: Lesson Plan
Approaching Big Data: Lesson Plan Approaching Big Data: Lesson Plan
Approaching Big Data: Lesson Plan Bessie Chu
 
Big data characteristics, value chain and challenges
Big data characteristics, value chain and challengesBig data characteristics, value chain and challenges
Big data characteristics, value chain and challengesMusfiqur Rahman
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big datakk1718
 
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageGeospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageSteven Ramage
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysisPoonam Kshirsagar
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big DataIndu Khemchandani
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science EducationJames Hendler
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data Srinath Perera
 

Mais procurados (20)

NewMR 2016 presents: 9 Big Applications of Big Data
NewMR 2016 presents: 9 Big Applications of Big DataNewMR 2016 presents: 9 Big Applications of Big Data
NewMR 2016 presents: 9 Big Applications of Big Data
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Personalized News and Video Recomendation System at LinkSure
Personalized News and Video Recomendation System at LinkSurePersonalized News and Video Recomendation System at LinkSure
Personalized News and Video Recomendation System at LinkSure
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Transforming a Business Through Analytics
Transforming a Business Through AnalyticsTransforming a Business Through Analytics
Transforming a Business Through Analytics
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)
 
Approaching Big Data: Lesson Plan
Approaching Big Data: Lesson Plan Approaching Big Data: Lesson Plan
Approaching Big Data: Lesson Plan
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
Big data characteristics, value chain and challenges
Big data characteristics, value chain and challengesBig data characteristics, value chain and challenges
Big data characteristics, value chain and challenges
 
Big data
Big dataBig data
Big data
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageGeospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science Education
 
Big data mining
Big data miningBig data mining
Big data mining
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
 

Destaque

Patterns of Social Media Marketing Practice
Patterns of Social Media Marketing PracticePatterns of Social Media Marketing Practice
Patterns of Social Media Marketing Practicesuresh sood
 
Australian Computer Society prezzo
Australian Computer Society  prezzo Australian Computer Society  prezzo
Australian Computer Society prezzo suresh sood
 
Crowdsourcing Social Media
Crowdsourcing Social Media Crowdsourcing Social Media
Crowdsourcing Social Media suresh sood
 
Community briefing
Community briefing Community briefing
Community briefing suresh sood
 
Brand mgmt and social media
Brand mgmt and social mediaBrand mgmt and social media
Brand mgmt and social mediasuresh sood
 
Crowdsourcing co creation and ideation
Crowdsourcing co creation and ideationCrowdsourcing co creation and ideation
Crowdsourcing co creation and ideationsuresh sood
 

Destaque (8)

Patterns of Social Media Marketing Practice
Patterns of Social Media Marketing PracticePatterns of Social Media Marketing Practice
Patterns of Social Media Marketing Practice
 
Smmp 3 slides
Smmp 3 slidesSmmp 3 slides
Smmp 3 slides
 
Australian Computer Society prezzo
Australian Computer Society  prezzo Australian Computer Society  prezzo
Australian Computer Society prezzo
 
KLSMMPday1
KLSMMPday1KLSMMPday1
KLSMMPday1
 
Crowdsourcing Social Media
Crowdsourcing Social Media Crowdsourcing Social Media
Crowdsourcing Social Media
 
Community briefing
Community briefing Community briefing
Community briefing
 
Brand mgmt and social media
Brand mgmt and social mediaBrand mgmt and social media
Brand mgmt and social media
 
Crowdsourcing co creation and ideation
Crowdsourcing co creation and ideationCrowdsourcing co creation and ideation
Crowdsourcing co creation and ideation
 

Semelhante a Spark Social Media

Data science Innovations January 2018
Data science Innovations January 2018Data science Innovations January 2018
Data science Innovations January 2018suresh sood
 
Big Data for International Development
Big Data for International DevelopmentBig Data for International Development
Big Data for International DevelopmentAlex Rascanu
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_publicAttila Barta
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your RoleJay Gendron
 
Foresight Analytics
Foresight AnalyticsForesight Analytics
Foresight Analyticssuresh sood
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Data science innovations
Data science innovations Data science innovations
Data science innovations suresh sood
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache HadoopSuman Saurabh
 
Internet of Things: Lightning Round, Hite
Internet of Things: Lightning Round, HiteInternet of Things: Lightning Round, Hite
Internet of Things: Lightning Round, HiteGovLoop
 
Göteborg university(condensed)
Göteborg university(condensed)Göteborg university(condensed)
Göteborg university(condensed)Zenodia Charpy
 
JIMS Rohini IT Flash Monthly Newsletter - October Issue
JIMS Rohini IT Flash Monthly Newsletter  - October IssueJIMS Rohini IT Flash Monthly Newsletter  - October Issue
JIMS Rohini IT Flash Monthly Newsletter - October IssueJIMS Rohini Sector 5
 
Semantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data ContextSemantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data ContextMurad Daryousse
 
Data, data, data
Data, data, dataData, data, data
Data, data, dataandrewxhill
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)James Hendler
 
Agile data science
Agile data scienceAgile data science
Agile data scienceJoel Horwitz
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
 

Semelhante a Spark Social Media (20)

Data science Innovations January 2018
Data science Innovations January 2018Data science Innovations January 2018
Data science Innovations January 2018
 
Big Data for International Development
Big Data for International DevelopmentBig Data for International Development
Big Data for International Development
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your Role
 
Big data Paper
Big data PaperBig data Paper
Big data Paper
 
Foresight Analytics
Foresight AnalyticsForesight Analytics
Foresight Analytics
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
data, big data, open data
data, big data, open datadata, big data, open data
data, big data, open data
 
Data science innovations
Data science innovations Data science innovations
Data science innovations
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 
Internet of Things: Lightning Round, Hite
Internet of Things: Lightning Round, HiteInternet of Things: Lightning Round, Hite
Internet of Things: Lightning Round, Hite
 
Göteborg university(condensed)
Göteborg university(condensed)Göteborg university(condensed)
Göteborg university(condensed)
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
JIMS Rohini IT Flash Monthly Newsletter - October Issue
JIMS Rohini IT Flash Monthly Newsletter  - October IssueJIMS Rohini IT Flash Monthly Newsletter  - October Issue
JIMS Rohini IT Flash Monthly Newsletter - October Issue
 
Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
 
Semantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data ContextSemantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data Context
 
Data, data, data
Data, data, dataData, data, data
Data, data, data
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 

Mais de suresh sood

Getting to the Edge of the Future - Tools & Trends of Foresight to Nowcasting
Getting to the Edge of the Future - Tools & Trends of Foresight to NowcastingGetting to the Edge of the Future - Tools & Trends of Foresight to Nowcasting
Getting to the Edge of the Future - Tools & Trends of Foresight to Nowcastingsuresh sood
 
Data Science Innovations
Data Science InnovationsData Science Innovations
Data Science Innovationssuresh sood
 
Foresight conversation
Foresight conversationForesight conversation
Foresight conversationsuresh sood
 
Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016suresh sood
 
Beyond dashboards
Beyond dashboardsBeyond dashboards
Beyond dashboardssuresh sood
 
Bigdataforesight
BigdataforesightBigdataforesight
Bigdataforesightsuresh sood
 
Australian Business Culture
Australian Business Culture Australian Business Culture
Australian Business Culture suresh sood
 
Transforming instagram data into location intelligence
Transforming instagram data into location intelligenceTransforming instagram data into location intelligence
Transforming instagram data into location intelligencesuresh sood
 
Analytic innovation transforming instagram data into predicitive analytics wi...
Analytic innovation transforming instagram data into predicitive analytics wi...Analytic innovation transforming instagram data into predicitive analytics wi...
Analytic innovation transforming instagram data into predicitive analytics wi...suresh sood
 

Mais de suresh sood (17)

Getting to the Edge of the Future - Tools & Trends of Foresight to Nowcasting
Getting to the Edge of the Future - Tools & Trends of Foresight to NowcastingGetting to the Edge of the Future - Tools & Trends of Foresight to Nowcasting
Getting to the Edge of the Future - Tools & Trends of Foresight to Nowcasting
 
Bigdata AI
Bigdata AI Bigdata AI
Bigdata AI
 
Bigdata ai
Bigdata aiBigdata ai
Bigdata ai
 
Data Science Innovations
Data Science InnovationsData Science Innovations
Data Science Innovations
 
Foresight conversation
Foresight conversationForesight conversation
Foresight conversation
 
future2020
future2020future2020
future2020
 
Swarm jobs
Swarm jobsSwarm jobs
Swarm jobs
 
Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016
 
Beyond dashboards
Beyond dashboardsBeyond dashboards
Beyond dashboards
 
Datainnovation
DatainnovationDatainnovation
Datainnovation
 
Bigdatahuman
BigdatahumanBigdatahuman
Bigdatahuman
 
Bigdataforesight
BigdataforesightBigdataforesight
Bigdataforesight
 
DBIA
DBIADBIA
DBIA
 
Australian Business Culture
Australian Business Culture Australian Business Culture
Australian Business Culture
 
Cool Tools
Cool Tools Cool Tools
Cool Tools
 
Transforming instagram data into location intelligence
Transforming instagram data into location intelligenceTransforming instagram data into location intelligence
Transforming instagram data into location intelligence
 
Analytic innovation transforming instagram data into predicitive analytics wi...
Analytic innovation transforming instagram data into predicitive analytics wi...Analytic innovation transforming instagram data into predicitive analytics wi...
Analytic innovation transforming instagram data into predicitive analytics wi...
 

Último

Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 

Último (20)

Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 

Spark Social Media

  • 1. Sparking Social Big Data linkedin.com/in/sureshsood @soody
  • 2. Handouts & Reference Materials 1.NIST Big Data Interoperability Volume 1 Definitions Final Version 1 9/ 2015 2. Field Guide to Hadoop (preview edition) 3. Learning Spark preview edition 4. Databricks Spark Reference Applications 5. Spark Data Analytics projects/users
  • 3. Areas for Conversation Social (content, structure and analytics) Data Science Primer and Resources:Big data, Spark ecosystem Data Science Innovation
  • 4. Roadmap – Evolution from Existing Operations to Predictive 4 Rigid Flexible Connected What if conversations continue? (Adapted from Solis, 2012 and Davenport 2007) Themes Silo, rigid Hoarding info Vs. collaboration Freely share info and Knowledge on internal basis acting social with customers 2 –way communications Connected internal and External. Listening and Learning. Internal and external engagement Shared via hub and Spoke. Employees Connected directly to Customers. Adaptive Agile, integrate customer Experiences and feedback Loops. Listening and Learning now become analyse and insights Makes sense of data And transforms into Intelligence. Respond in Real time Predictive Shift from reactive to Proactive and predictive Business uses social media heavily and is flexible, connected, adaptive and predictive in terms of customer experiences, needs and new opportunities. Predict scenarios before they occur maximise opportunity and limit risk How can we lead conversations? (predictive recommendation) What conversations are next? Why are these conversations occurring? What actions are required? What are the sentiment of conversations? When and where are conversations taking place? What conversations are taking place? Business Intelligence
  • 6. Social CRM integrates social data 6
  • 7. Aquarius,Aries,Cancer,Capricorn,Gemini,Leo,Libra, Pisces, Sagittarius,Scorpio,Taurus,Virgo Ambivalent, Employee, Opposer, Reporter, Supporter 11. Committed Partnerships, 12. Compartmentalised Friendship,13. Childhood friendship,14. Courtship,15. Fling, 16. Secret-Affair, 17. Enslavement , 2. Marriages of Convenience,3. Best Friendships,4. Kinships, 5. Rebounds/ Avoidance-Driven,6. Courtships,7.Dependencies 8. Enmities, 9. Love-Hate (Sweeney and Chew) Africa,Argentina,Australia,Australia/Hong Kong, Austria, California, Canada, China, Egypt, England, Finland, France Germany, Guernsey, Holland, India, Indonesia, Ireland , Israel, Italy , Japan, Kuwait, Malaysia, Nepal,Paraguay , Philippines, Phillipines, Portugual, Saudi Arabia, Singapore South Africa, Spain, Sweden, Taiwan, Thailand,UK ,USA A&F,Beijing ,Gucci,LVMH,New York,Old Navy, ,Paris, Sydney, Tiffany, Tokyo, Tommy, Versace An-Verb,An-Vis,Hol-Verb,Hol-Vis Depriv/Enhance,Enhance/Depriv
  • 8.
  • 9.
  • 11. 11
  • 12. Exploring Variable Distributions (Training Data Set)
  • 13. Data Visualisation of Variables (Training Data Set)
  • 14. Item Frequency In Support of Association Rules
  • 15. Display of Decision Tree for Brand as Target Variable
  • 16. Model Comparison By Variables/Predictors
  • 17. • Australian Pioneer Dr John Galloway (AM) • 1990’s Ivan Milat killed 7 backpackers making him Australia's most notorious Serial Killer • Everyone in Australia was a suspect • Large volumes of data from multiple sources  RTA Vehicle records  Gym Memberships  Gun Licensing records  Internal Police records • Police applied node link analysis techniques (NetMap) to the data • Harness power of the human mind • Analyst can spot indirect links, patterns , structure, relationships and anomalies • A bottom-up approach with process of discovery to uncover structure • Reduced the suspect list from 18 million to 230 • Further analysis with the use of additional satellite information reduced this to 32 Node Link Analytics Data Information Knowledge
  • 18. Key Network Measures • Degree Centrality • Betweenness Centrality • Closeness Centrality • Eigenvector Centrality krackkite.##h (modified labels) Connector (hub) Diana’s Clique Broker Boundary spanners Contractor ? Vendor
  • 19.
  • 20. Collect data Analyse data & find patterns Theory formulation Test Theory inductive reasoning
  • 21. 21
  • 22. 22
  • 23. Language on Twitter Tracks Rates of Coronary Heart Disease, Psychological Science, January 2015 23 The findings show that expressions of negative emotions such as anger, stress, and fatigue in the tweets from peo The results suggest that using Twitter as a window into a community’s collective mental state may provide a usefu
  • 24. Twitter and Marketing Predictions • Tweets is “found data” without asking questions • More meaning than typical search engine query • Large numbers of passive participants in natural settings • Twitter can predict the stock market (Lisa Grossman, Wired, Oct 19 2010) • Predict movie success in first few weekends of release – “…it also raises an interesting new question for advertisers and marketing executives. Can they change the demand for their film, product or service buy directly influencing the rate at which people tweet about it? In other words, can they change the future that tweeters predict?” Tech Review, http://www.technologyreview.com/blog/arxiv/25000/ 24
  • 25. Psychological analytics helps put human context into Business • Behavior data  Links human emotions to business -> Analyse footprints left behind. • What really does customer satisfaction mean ? Is the person actually happy? • How do we take the emotional dimension into account for customer experience? • How do we recognize someone is dissatisfied? • How do we recognize a “distressed” person? • Do we use text and voice? Will sleeping patterns and eating habits help? • Would you act differently if someone is happy? • How do you coach employees to see how someone sounds in emotional terms? • Understanding when distress exists and when a customer needs enhanced service • Behaviour data reveals attitude and intent. This is more predictive of future opportunities and risk versus historical data
  • 26. 26
  • 27. The Newman Model of Deception (Pennebaker et al) Key word categories for deception mapping: (1) Self words e.g. “I” and “me” – decrease when someone distances themselves from content (2) Exclusive words e.g. “but” and “or” decrease with fabricated content owing to complexity of maintaining deception (3) Negative emotion words e.g. “hate” increase in word usage owing to shame or guilty feeling (4) Motion verbs e.g. “go” or “move” increase as exclusive words go down to keep the story on track
  • 28. Instagram Deception (Suspects outside of -20 & +20) Vine Deception (Suspects outside of -5 and +5)
  • 29. Variety of Data Types & Big Data Challenge 1. Astronomical 2. Documents 3. Earthquake 4. Email 5. Environmental sensors 6. Fingerprints 7. Health (personal) Images 8. Graph data (social network) 9. Location 10.Marine 11.Particle accelerator 12.Satellite 13.Scanned survey data 14.Sound 15.Text 16.Transactions 17.Video Big Data consists of extensive datasets primarily in the characteristics of volume, variety, velocity, and/or variability that require a scalable architecture for efficient storage, manipulation, and analysis. . Computational portability is the movement of the computation to the location of the data.
  • 30.
  • 31. Statistics, Data Mining or Data Science ? • Statistics – precise deterministic causal analysis over precisely collected data • Data Mining – deterministic causal analysis over re-purposed data carefully sampled • Data Science – trending/correlation analysis over existing data using bulk of population i.e. big data – Extraction of actionable knowledge directly from data through a process of discovery, hypothesis, and hypothesis testing. Adapted from: NIST Big Data taxonomy draft report : (see http://bigdatawg.nist.gov /show_InputDoc.php)
  • 32. Data Science Workflows & Discovery
  • 33. HadoopConfigurations(SingleandMulti-Rack) Adapted from: http://stackiq.com/ Cluster manager e.g. Apache Ambari, Apache Mesos, or Rocks 3 TB drives ,18 data nodes configuration represents 648 TB of raw storage HDFS standard replication factor of 3 216 TB of usable storage Name/secondary/data nodes – 6 core 96 GB Management node – 4 core 16 GB
  • 34. Spark - High Level Abstraction (version 1.4.0) 34 Full-text search of Wikipedia in <1 sec (vs 20 sec for on-disk data) Spark Core SparkR
  • 36. Berkeley Data Analytics Stack (BDAS) AMPCrowd: RESTful web service for sending tasks to human workers on crowd platforms . Used by sampleclean.org - Data Cleaning With Algorithms, Machines, and People
  • 37. Data Science Innovation Data science innovation is something an organization has not done before or even something nobody anywhere has done before. A data science innovation focuses on discovering and using new or untraditional data sources to solve new problems. Adapted from: Franks, B. (2012) Taming the Big Data Tidal Wave, p. 255, John Wiley & Son
  • 38. http://tacocopter.com/ New Sources of Information (Big data) : Social Media + Internet of Things  Accounting Analytic Innovations 7,919 40,204 2,003,254,102 51 Gridded Data Sources http://smap.jpl.nasa.gov/
  • 39. The ANZ Heavy Traffic Index comprises flows of vehicles weighing more than 3.5 tonnes (primarily trucks) on 11 selected roads around NZ. It is contemporaneous with GDP growth. The ANZ Light Traffic Index is made up of light or total traffic flows (primarily cars and vans) on 10 selected roads around the country. It gives a six month lead on GDP growth in normal circumstances (but cannot predict sudden adverse events such as the Global Financial Crisis). http://www.a http://www.anz.co.nz/about-us/economic-markets-research/truckometer/ ANZ TRUCKOMETER
  • 40. Oil reserves shipment monitoring Ras Tanura Najmah compound, Saudi Arabia Source: http://www.skyboximaging.com/blog/monitoring-oil-reserves-from-space
  • 41. The following BigQuery query (note that the wildcard on "TAX_WEAPONS_SUICIDE_" catches suicide vests, suicide bombers, suicide bombings, suicide jackets, and so on): SELECT DATE, DocumentIdentifier, SourceCommonName, V2Themes, V2Locations, V2Tone, SharingImage, TranslationInfo FROM [gdeltv2.gkg] where (V2Themes like '%TAX_TERROR_GROUP_ISLAMIC_STATE%' or V2Themes like '%TAX_TERROR_GROUP_ISIL%' or V2Themes like '%TAX_TERROR_GROUP_ISIS%' or V2Themes like '%TAX_TERROR_GROUP_DAASH%') and (V2Themes like '%TERROR%TERROR%' or V2Themes like '%SUICIDE_ATTACK%' or V2Themes like '%TAX_WEAPONS_SUICIDE_%') The GDELT Project pushes the boundaries of “big data,” weighing in at over a quarter-billion rows with 59 fields for each record, spanning the geography of the entire planet, and covering a time horizon of more than 35 years. The GDELT Project is the largest open-access database on human society in existence. Its archives contain nearly 400M latitude/longitude geographic coordinates spanning over 12,900 days, making it one of the largest open-access spatio-temporal datasets as well. GDELT + BigQuery = Query The Planet
  • 42. MEMEX 42 Source : http://www.scientificamerican.com/slideshow/scientific-american-exclusive-darpa-memex-data-maps/ Also see, http://humantraffickingcenter.org/posts-by-htc-associates/memex-helps-find-human-trafficking-cases-online/
  • 43. MEMEX - Human Trafficking Analytics • Human traffickers coercive victims into sex work or low cost labour appearing in adverts online • Adverts contain embedded data on name of worker, contact info, physical characteristics, services offered, location, price/pay rates, and other attributes. Useful data but not accessible via SQL or R. • DeepDive converts “raw set of advertisements into a single clean structured database table” • 30 million advertisements obtained for sex work from online • Trafficking analytic signals ✴ Traffickers move victims from place to place to keep them isolated and easier to control. Detect individuals in the advertisement data who post multiple advertisements from different physical locations ✴ Non-trafficked sex workers exhibit economic rationality charge as much as possible for services, and avoid engaging in risky services. Charging non-market rates or engaging in risky services ✴ Traffickers may have multiple victims simultaneously. If the contact information for multiple workers across multiple advertisements contains consecutive phone numbers, it might suggest one individual purchased several phones at one time. 43 Source : http://www.scientificamerican.com/slideshow/scientific-american-exclusive-darpa-memex-data-maps/ Also see, http://humantraffickingcenter.org/posts-by-htc-associates/memex-helps-find-human-trafficking-cases-online/
  • 44. DeepDive Data Extraction and Dataset Generation • URL where the advertisement was found • Phone number of the person in the advertisement • Name of the person in the advertisement • Location where the person offers services • Rates for services offered 44
  • 45. Internet of Things “trillion sensors” Source: www.tsensorssummit.org
  • 46. 3. Black Box Insurance • Big data transforms actuarial insurance from using probability methods to estimate premiums into dynamic risk management using real data generating individually tailored premiums • Estimate 20 km work or home journey, data point acquired every min and journey captures 12 points per km. Assume 1000 km per month driving or generating 12,000 points per month resulting in 144,000 points per car/annum. Hence, 1,000 cars leads to 144 million points per annum. • Telematics technology (black box) monitor helps assess the driving behavior and prices policy based on true driver centric premiums by capturing: – Number of journeys – Distances travelled – Types of roads – Speed – Time of travel – Acceleration and braking – Any accidents – Location ? • Benefits low mileage, smooth and safe drivers • Privacy vs. Saving monies on insurance (Canada ; http://bit.ly/Black_box)
  • 47. Smart Sandbag System smart-dove.com The first 3 columns are x, y, z axis of gyroscope, then x, y, z axis of accelerator. These are raw data of 40 repetitions of shoulder press exercise. Standard Deviation and moving average algorithm to build the chart and Hidden Markov Model to extract features and build model of exercise. All models are put into cloud for trainee exercise scoring.
  • 48. 6. Supermarket Shopper Behavior Beacon Active Card
  • 49.
  • 50. • The data collected in a single day take nearly two million years to playback on an MP3 player • Generates enough raw data to fill 15 million 64GB iPods every day • The central computer has processing power of about one hundred million PCs • Uses enough optical fiber linking up all the radio telescopes to wrap twice around the Earth • The dishes when fully operational will produce 10 times the global internet traffic as of 2013 • The supercomputer will perform 1018 operations per second - equivalent to the number of stars in three million Milky Way galaxies - in order to process all the data produced. • Sensitivity to detect an airport radar on a planet 50 light years away. • Thousands of antennas with a combined collecting area of 1,000,000 square meters - 1 sqkm) • Previous mapping of Centaurus A galaxy took a team 12,000 hours of observations and several years - SKA ETA 5 minutes ! To the scientists involved, however, the SKA is no testbed, it’s a transformative instrument which, according to Luijten, will lead to “fundamental discoveries of how life and planets and matter all came into existence. As a scientist, this is a once in a lifetime opportunity.” Sources: http://bit.ly/amazin-facts & http://bit.ly/astro-ska Galileo Square Kilometer Array Construction (SKA1 - 2018-23; SKA2 - 2023-30) Centaurus A
  • 52. 52 The future is impossible to predict. However one thing is certain : The company that can excite it’s customers dreams is out ahead in the race to business success Selling Dreams, Gian Luigi Longinotti

Notas do Editor

  1. Combine traditional and social data to create a Social CRM Build social fields into customer contact information Track social media interactions with customers. Understand where customers hang with social media data Collect customer feedback from social channels.
  2. Diana – max links (degree centrality) most connected – connector or hub – number of nodes connected – high influence of spreading info or virus Heather – best location powerful figure as broker to determine what flows and doesn’t –single point of failure – high betweeness = high influence – position of node as gatekeeper to exploit structural holes (gaps in network) Fernado & Garth – shortest paths = closeness – the bigger the number the less central Eigenvector = importance of node in network ~ page rank google is similar measure – being connected to well connected a popularity and power measure