SlideShare a Scribd company logo
1 of 9
Download to read offline
CS 8803 Social Computing Data Mini-Project
                      Harish Kanakaraju Prashanth Palanthandalam




Problem I


Method:

To analyze the prominence of people who are following a particular celebrity. Three
celebrities who were analyzed are

      Britney Spears
      Mariah Carey
      Ashley Tisdale

These celebrities are all singers and among the top 11 influential celebrities in twitter.
Britney spears has close to 7.7 million followers with Ashley Tisdale and Mariah Carey
having approximately 4.3 millions each.

The samples of followers of these celebrities were analyzed to find out how many of
them were prominent. The prominence of each followers were found out using
The formula “No of followers/No of following”, higher the value, higher the prominence.

We used the sample sizes of 1500, 2000 and 3000. The confidence interval is 1.8 and
confidence level is 95% for the sample size of 3000, considering the total population of
the celebrity’s followers.

The initial analysis with a sample size of 1500 was done to find the effect of sample size
on the prominence ratio.

Results:

SS = 1500                                     Prominence Ratio
                  Mean         Median          SD            Chi square        P-value
Britney Spears    0.288        0.056           2.047
Mariah Carey      0.265        0.132           1.383
Ashley Tisdale    0.239        0.115           0.880

SS = 2000                                     Prominence Ratio
                  Mean         Median          SD            Chi square        P-value
Britney Spears    0.546       0.111          3.067
Mariah Carey      0.289       0.163          1.230
Ashley Tisdale    0.406       0.130          7.007

SS = 3000                                   Prominence Ratio
                  Mean        Median         SD            Chi square      P-value
Britney Spears    0.493       0.081          3.403
Mariah Carey      0.258       0.154          1.014
Ashley Tisdale    0.348       0.133          5.734


Basic Analysis:


The mean and the standard deviation may swing either ways based on the sample due
to the outliers. If the sample contains one very prominent person, it would boost the
mean and SD values. But the median trend always remains the same.

Using Median: Mariah Carey has prominent followers than Ashley Tisdale. And Ashley
Tisdale has more prominent followers than Britney spears.

From Fig 1, we can see that Britney spears has relatively high number of low prominent
followers (ratio close to zero), while Ashley and Mariah have large number of followers
with a decent prominence value, while number of followers for Britney in this region is
low. That’s why her median is the lowest among the three.

From Fig 2, we can find that Britney Spears has relatively more number of very
prominent followers compared to Ashley and Mariah. But the very prominent followers
are very very less in number compared to the whole population set.
R Commands used:

The below sequence was executed for the three celebrities,

at4 <- getUser("ashleytisdale")
at4Fl <- at4$getFollowers(n=3000)
at4FFl <- sapply(at4Fl,followersCount)
at4FFd <- sapply(at4Fl,friendsCount)
at4Ratio <- mapply("/", at4FFl, at4FFd)
med <- median(sort(at4Ratio))
stad<- sd(at4Ratio)
meanRatio <- mean(at4Ratio)
at4sum <- sum(at4Ratio)
Chi-square test

Chisq.test(c(at4sum,bs4sum,mc4sum))

Plotting graph (executed only once)

xyz <- cbind(bs4Ratio, at4Ratio, mc4Ratio, deparse.level = 1)
data = melt(xyz, id=c("bs4Ratio"))
lowProminence <- qplot(value, data = data, geom = "histogram", color = X2, binwidth =
50)
highP <- ggplot(data, aes(x=X2, y=value))
highP + geom_point(position = "jitter")




                         Fig 1: Low prominent followers




                             Fig 2: High prominent followers
Problem II

Method:

To extract tweets from two different geographic locations in the world, and select the
tweets which contain the phrase “I want”. A comparison of preferences of the twitter
users from the two locations has been done, with respect to the terms “I want a pizza”
and “I want to sleep”. Also, the mood of the users on Monday and Friday has been
studied, by extracting the tweets with the terms “Monday” and “I hate”; and “Friday”
and “Thank God”.

The searchTwitter() functionality of the twitteR package for R Studio has been used.
The two cities chosen were Seattle, Washington and Southampton, UK.
1000 tweets with the phrase “I want” were extracted within a 20 mile radius of the two
cities.

southamTweets = searchTwitter("I
want",1000,NULL,NULL,NULL,NULL,'50.903,-1.40625,20mi',NULL)

The list of 1000 tweets is then converted into text form by using the lapply() command.

southamTweets.text = lapply(southamTweets, function(southampton)
southampton$getText())



The grep() command is used to extract incidences of the term “pizza” in the tweet list.

southamTweets.spec = grep("pizza",southamTweets.text,TRUE)

The procedure is repeated for Seattle:

seattleTweets = searchTwitter("I
want",1000,NULL,NULL,NULL,NULL,'47.606,-122.299,20mi',NULL)
> seattleTweets.text = lapply(seattleTweets,function(seattle)
seattle$getText())
> seattle.spec = grep("pizza",seattleTweets.text,TRUE)

Variations of the “I want a pizza” phrase have also been tried.

seattleSpecific.spec = grep("I want pizza",seattleTweets.text,TRUE)



Instead of “pizza”, the tweets containing the phrase “sleep” or “I want to sleep” were
used.

southamTweetsSleep.spec = grep("sleep",southamTweets.text,TRUE)

southamTweetsSleepSpecific.spec = grep("I want to
sleep",southamTweets.text,TRUE)

seattleSleep.spec = grep("sleep",seattleTweets.text,TRUE)

seattleSleepSpecific.spec = grep("I want to
sleep",seattleTweets.text,TRUE)

seattleSleepSpecific.spec = grep("I want
sleep",seattleTweets.text,TRUE)

Another variant of the above experiment was done, with the terms “Monday” and
“Friday” and respectively, the phrases “I hate” and “Thank God”
seattleMonday =
searchTwitter("Monday",1000,NULL,NULL,NULL,NULL,'47.606,-
122.299,20mi',NULL)
> seattleFriday =
searchTwitter("Friday",1000,NULL,NULL,NULL,NULL,'47.606,-
122.299,20mi',NULL)
> southamMonday = searchTwitter("I
want",1000,NULL,NULL,NULL,NULL,'50.903,-1.40625,20mi',NULL)
> southamMonday =
searchTwitter("Monday",1000,NULL,NULL,NULL,NULL,'50.903,-
1.40625,20mi',NULL)
> southamFriday =
searchTwitter("Friday",1000,NULL,NULL,NULL,NULL,'50.903,-
1.40625,20mi',NULL)
> southamMonday.text = lapply(southamMonday, function(southampton)
southampton$getText())
> southamFriday.text = lapply(southamFriday, function(southampton)
southampton$getText())
>
> seattleFriday.text = lapply(seattleFriday, function(seattle)
seattle$getText())
>
> seattleMonday.text = lapply(seattleMonday, function(seattle)
seattle$getText())
>
> seattleMonday.spec = grep("I hate",seattleMonday.text,TRUE)
> seattleFriday.spec = grep("Thank God",seattleFriday.text,TRUE)
> southamFriday.spec = grep("Thank God",southamFriday.text,TRUE)
> southamMonday.spec = grep("I hate",southamMonday.text,TRUE)

The Chi-Square Statistical test was then done on the data obtained using the chisq.test()
command.

The results obtained were plotted using the following commands:

x   <- rchisq(southamFriday.spec,southamMonday.spec)
>   hist(x,prob = TRUE)
>   curve( dchisq(x, df=5), col='green', add=TRUE)
>   curve( dchisq(x, df=10), col='red', add=TRUE )
>   lines( density(x), col='orange')



Both histogram and density line plots have been used to depict the results.

Result:

Broadly, it was found that the terms “I want” and “pizza” featured together in only six
out of 1000 tweets in Seattle, and the single phrase “I want pizza” returned three
tweets.

The issue with searchTwitter() is that “I want” is not considered as a continuous term,
and the command also returned tweets such as “I really think I want…” or “I don’t think
he wants..”
Seattle threw up 10 tweets out of 1000 with the term “sleep”. However, “I want to
sleep” did not return any values, and “I want sleep” returned just one result.




In Southampton, only one tweet out of 1000 expressed the desire to have pizza, indeed,
there was only one tweet with comprised of “I want” and “pizza” in the same tweet,
while “I want a pizza” returned no results. It appears that pizza is more popular in
cosmopolitan Seattle than the relatively more conservative Southampton.

23 tweets were returned by the query for the term “sleep” in Southampton, and two for
“I want to sleep”, which is marginally higher than the results for Seattle.
In the experiment with tweets posted on Mondays and Fridays, it appears that citizens
of both cities rant more on Mondays, in comparison to feeling thankful on Fridays. The
search for “I hate” and “Monday” returned 54 tweets in Seattle, while “Thank God” and
“Friday” returned just one, which is surprising. Southampton returned 8 tweets for the
former query (Monday), and two for the latter.
Thus, it is seen that Southampton returns an almost symmetric plot as compared to
Seattle, where the difference between Monday and Friday is more substantial.

More Related Content

Recently uploaded

Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in PakistanChallenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
vineshkumarsajnani12
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
daisycvs
 
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan CytotecJual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
ZurliaSoop
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
daisycvs
 

Recently uploaded (20)

Putting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptxPutting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptx
 
PARK STREET 💋 Call Girl 9827461493 Call Girls in Escort service book now
PARK STREET 💋 Call Girl 9827461493 Call Girls in  Escort service book nowPARK STREET 💋 Call Girl 9827461493 Call Girls in  Escort service book now
PARK STREET 💋 Call Girl 9827461493 Call Girls in Escort service book now
 
UAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur Dubai
UAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur DubaiUAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur Dubai
UAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur Dubai
 
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in PakistanChallenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
 
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
 
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR ESCORTS
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR  ESCORTSJAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR  ESCORTS
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR ESCORTS
 
Arti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdfArti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdf
 
CROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NSCROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NS
 
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
 
Phases of Negotiation .pptx
 Phases of Negotiation .pptx Phases of Negotiation .pptx
Phases of Negotiation .pptx
 
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service AvailableNashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
 
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan CytotecJual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
Jual Obat Aborsi ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan Cytotec
 
Kalyan Call Girl 98350*37198 Call Girls in Escort service book now
Kalyan Call Girl 98350*37198 Call Girls in Escort service book nowKalyan Call Girl 98350*37198 Call Girls in Escort service book now
Kalyan Call Girl 98350*37198 Call Girls in Escort service book now
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League City
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with Culture
 
Falcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business Potential
 
Buy gmail accounts.pdf buy Old Gmail Accounts
Buy gmail accounts.pdf buy Old Gmail AccountsBuy gmail accounts.pdf buy Old Gmail Accounts
Buy gmail accounts.pdf buy Old Gmail Accounts
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Soc prashanth harish

  • 1. CS 8803 Social Computing Data Mini-Project Harish Kanakaraju Prashanth Palanthandalam Problem I Method: To analyze the prominence of people who are following a particular celebrity. Three celebrities who were analyzed are  Britney Spears  Mariah Carey  Ashley Tisdale These celebrities are all singers and among the top 11 influential celebrities in twitter. Britney spears has close to 7.7 million followers with Ashley Tisdale and Mariah Carey having approximately 4.3 millions each. The samples of followers of these celebrities were analyzed to find out how many of them were prominent. The prominence of each followers were found out using The formula “No of followers/No of following”, higher the value, higher the prominence. We used the sample sizes of 1500, 2000 and 3000. The confidence interval is 1.8 and confidence level is 95% for the sample size of 3000, considering the total population of the celebrity’s followers. The initial analysis with a sample size of 1500 was done to find the effect of sample size on the prominence ratio. Results: SS = 1500 Prominence Ratio Mean Median SD Chi square P-value Britney Spears 0.288 0.056 2.047 Mariah Carey 0.265 0.132 1.383 Ashley Tisdale 0.239 0.115 0.880 SS = 2000 Prominence Ratio Mean Median SD Chi square P-value
  • 2. Britney Spears 0.546 0.111 3.067 Mariah Carey 0.289 0.163 1.230 Ashley Tisdale 0.406 0.130 7.007 SS = 3000 Prominence Ratio Mean Median SD Chi square P-value Britney Spears 0.493 0.081 3.403 Mariah Carey 0.258 0.154 1.014 Ashley Tisdale 0.348 0.133 5.734 Basic Analysis: The mean and the standard deviation may swing either ways based on the sample due to the outliers. If the sample contains one very prominent person, it would boost the mean and SD values. But the median trend always remains the same. Using Median: Mariah Carey has prominent followers than Ashley Tisdale. And Ashley Tisdale has more prominent followers than Britney spears. From Fig 1, we can see that Britney spears has relatively high number of low prominent followers (ratio close to zero), while Ashley and Mariah have large number of followers with a decent prominence value, while number of followers for Britney in this region is low. That’s why her median is the lowest among the three. From Fig 2, we can find that Britney Spears has relatively more number of very prominent followers compared to Ashley and Mariah. But the very prominent followers are very very less in number compared to the whole population set. R Commands used: The below sequence was executed for the three celebrities, at4 <- getUser("ashleytisdale") at4Fl <- at4$getFollowers(n=3000) at4FFl <- sapply(at4Fl,followersCount) at4FFd <- sapply(at4Fl,friendsCount) at4Ratio <- mapply("/", at4FFl, at4FFd) med <- median(sort(at4Ratio)) stad<- sd(at4Ratio) meanRatio <- mean(at4Ratio) at4sum <- sum(at4Ratio)
  • 3. Chi-square test Chisq.test(c(at4sum,bs4sum,mc4sum)) Plotting graph (executed only once) xyz <- cbind(bs4Ratio, at4Ratio, mc4Ratio, deparse.level = 1) data = melt(xyz, id=c("bs4Ratio")) lowProminence <- qplot(value, data = data, geom = "histogram", color = X2, binwidth = 50) highP <- ggplot(data, aes(x=X2, y=value)) highP + geom_point(position = "jitter") Fig 1: Low prominent followers Fig 2: High prominent followers
  • 4. Problem II Method: To extract tweets from two different geographic locations in the world, and select the tweets which contain the phrase “I want”. A comparison of preferences of the twitter users from the two locations has been done, with respect to the terms “I want a pizza” and “I want to sleep”. Also, the mood of the users on Monday and Friday has been studied, by extracting the tweets with the terms “Monday” and “I hate”; and “Friday” and “Thank God”. The searchTwitter() functionality of the twitteR package for R Studio has been used. The two cities chosen were Seattle, Washington and Southampton, UK.
  • 5. 1000 tweets with the phrase “I want” were extracted within a 20 mile radius of the two cities. southamTweets = searchTwitter("I want",1000,NULL,NULL,NULL,NULL,'50.903,-1.40625,20mi',NULL) The list of 1000 tweets is then converted into text form by using the lapply() command. southamTweets.text = lapply(southamTweets, function(southampton) southampton$getText()) The grep() command is used to extract incidences of the term “pizza” in the tweet list. southamTweets.spec = grep("pizza",southamTweets.text,TRUE) The procedure is repeated for Seattle: seattleTweets = searchTwitter("I want",1000,NULL,NULL,NULL,NULL,'47.606,-122.299,20mi',NULL) > seattleTweets.text = lapply(seattleTweets,function(seattle) seattle$getText()) > seattle.spec = grep("pizza",seattleTweets.text,TRUE) Variations of the “I want a pizza” phrase have also been tried. seattleSpecific.spec = grep("I want pizza",seattleTweets.text,TRUE) Instead of “pizza”, the tweets containing the phrase “sleep” or “I want to sleep” were used. southamTweetsSleep.spec = grep("sleep",southamTweets.text,TRUE) southamTweetsSleepSpecific.spec = grep("I want to sleep",southamTweets.text,TRUE) seattleSleep.spec = grep("sleep",seattleTweets.text,TRUE) seattleSleepSpecific.spec = grep("I want to sleep",seattleTweets.text,TRUE) seattleSleepSpecific.spec = grep("I want sleep",seattleTweets.text,TRUE) Another variant of the above experiment was done, with the terms “Monday” and “Friday” and respectively, the phrases “I hate” and “Thank God”
  • 6. seattleMonday = searchTwitter("Monday",1000,NULL,NULL,NULL,NULL,'47.606,- 122.299,20mi',NULL) > seattleFriday = searchTwitter("Friday",1000,NULL,NULL,NULL,NULL,'47.606,- 122.299,20mi',NULL) > southamMonday = searchTwitter("I want",1000,NULL,NULL,NULL,NULL,'50.903,-1.40625,20mi',NULL) > southamMonday = searchTwitter("Monday",1000,NULL,NULL,NULL,NULL,'50.903,- 1.40625,20mi',NULL) > southamFriday = searchTwitter("Friday",1000,NULL,NULL,NULL,NULL,'50.903,- 1.40625,20mi',NULL) > southamMonday.text = lapply(southamMonday, function(southampton) southampton$getText()) > southamFriday.text = lapply(southamFriday, function(southampton) southampton$getText()) > > seattleFriday.text = lapply(seattleFriday, function(seattle) seattle$getText()) > > seattleMonday.text = lapply(seattleMonday, function(seattle) seattle$getText()) > > seattleMonday.spec = grep("I hate",seattleMonday.text,TRUE) > seattleFriday.spec = grep("Thank God",seattleFriday.text,TRUE) > southamFriday.spec = grep("Thank God",southamFriday.text,TRUE) > southamMonday.spec = grep("I hate",southamMonday.text,TRUE) The Chi-Square Statistical test was then done on the data obtained using the chisq.test() command. The results obtained were plotted using the following commands: x <- rchisq(southamFriday.spec,southamMonday.spec) > hist(x,prob = TRUE) > curve( dchisq(x, df=5), col='green', add=TRUE) > curve( dchisq(x, df=10), col='red', add=TRUE ) > lines( density(x), col='orange') Both histogram and density line plots have been used to depict the results. Result: Broadly, it was found that the terms “I want” and “pizza” featured together in only six out of 1000 tweets in Seattle, and the single phrase “I want pizza” returned three tweets. The issue with searchTwitter() is that “I want” is not considered as a continuous term, and the command also returned tweets such as “I really think I want…” or “I don’t think he wants..”
  • 7. Seattle threw up 10 tweets out of 1000 with the term “sleep”. However, “I want to sleep” did not return any values, and “I want sleep” returned just one result. In Southampton, only one tweet out of 1000 expressed the desire to have pizza, indeed, there was only one tweet with comprised of “I want” and “pizza” in the same tweet, while “I want a pizza” returned no results. It appears that pizza is more popular in cosmopolitan Seattle than the relatively more conservative Southampton. 23 tweets were returned by the query for the term “sleep” in Southampton, and two for “I want to sleep”, which is marginally higher than the results for Seattle.
  • 8. In the experiment with tweets posted on Mondays and Fridays, it appears that citizens of both cities rant more on Mondays, in comparison to feeling thankful on Fridays. The search for “I hate” and “Monday” returned 54 tweets in Seattle, while “Thank God” and “Friday” returned just one, which is surprising. Southampton returned 8 tweets for the former query (Monday), and two for the latter.
  • 9. Thus, it is seen that Southampton returns an almost symmetric plot as compared to Seattle, where the difference between Monday and Friday is more substantial.