Big Data Social Network Analysis (BDSNA) is the focal computational and graphical
study of powerful techniques that can be used to identify clusters, patterns, hidden
structures, generate business intelligence, in social relationships within social networks
in terms of network theory. Social Network Analysis (SNA) has a diversified set of
applications and research areas such as Health care, Travel and Tourism, Defence and
Security, Internet of Things (IoT) etc. . . With the boom of the internet, Web 2.0
and handheld devices, there is an explosive growth in size, complexity and variety in
unstructured data, thus the analysis and information extraction is of great value and
adaptation of Big Data concept to SNA is vital.
This literature survey aims to investigate the usefulness of SNA in the “Big Data
(BD)” arena. This survey report reviews major research studies that have proposed
business strategies, BD approaches to generate predictive models by gratifying contemporary
challenges that have arises from SNA.
Unveiling SOCIO COSMOS: Where Socializing Meets the Stars
Big Data Social Network Analysis
1. Big Data Social Network
Analysis
by
Chamin Nalinda
(Registration No : 2011/CS/005, Index No : 11000058)
chmk90@gmail.com
+94 772416604
SCS 3017
Literature Survey
Supervised by
Dr. H. A. Caldera
BSc(Colombo), PGDip(Colombo), MSc(Colombo), PhD(Western Sydney)
University of Colombo School of Computing
Colombo 7
SRI LANKA
TexMaker | Mendele Desktop |Harvard Style Referencing | Word Count = 5466
2. Declaration
I hereby declare that this literature survey report was written by Chamin Nalinda.
A great deal of analysis was carried out in preparing this report and the bibliography
reflects key reference materials. Self learned knowledge was also included. References
have been mentioned without violating the owner’s exact content(paragraphs, sentences
etc...)
Name of Candidate: L.G.H.C. Nalinda
Signature: ............................... Date: December 12, 2014
3. Abstract
Big Data Social Network Analysis (BDSNA) is the focal computational and graphical
study of powerful techniques that can be used to identify clusters, patterns, hidden
structures, generate business intelligence, in social relationships within social networks
in terms of network theory. Social Network Analysis (SNA) has a diversified set of
applications and research areas such as Health care, Travel and Tourism, Defence and
Security, Internet of Things (IoT) etc. . . With the boom of the internet, Web 2.0
and handheld devices, there is an explosive growth in size, complexity and variety in
unstructured data, thus the analysis and information extraction is of great value and
adaptation of Big Data concept to SNA is vital.
This literature survey aims to investigate the usefulness of SNA in the “Big Data
(BD)” arena. This survey report reviews major research studies that have proposed
business strategies, BD approaches to generate predictive models by gratifying con-
temporary challenges that have arises from SNA.
4. Acknowledgements
I would like to offer my heartfelt thanks to Dr. H. A. Caldera, my supervisor for the
Literature Survey for his immense support and continuos feedback during the course
of the Survey and for guiding me by giving valuable ideas.
Further, my sincere gratitude goes to all the lecturers, assistant lecturers and the
entire UCSC family.
Special thanks to my parents, brother and sister who have always given me the
strength through the journey of my life.
Chamin Nalinda, December 12, 2014
i
7. List of Figures
2.1 Sources Used to Find or Access Health and Welness Related informa-
tion in 2008, in United States of America (USA) . . . . . . . . . . . 6
2.2 9/11 attackers having weak ties with others . . . . . . . . . . . . . . 9
2.3 Decentralized terrorist network . . . . . . . . . . . . . . . . . . . . . 10
2.4 PISTA ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 most consulted Social Networks (SNs) in cybertravelling . . . . . . . 13
2.6 Traveller recommendation system . . . . . . . . . . . . . . . . . . . . 14
2.7 TAM to gain loyalty . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.8 Tweeting trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1 Expected growth in real time analytics by 2015 . . . . . . . . . . . . 21
3.2 Capabilities of Operational Intelligence . . . . . . . . . . . . . . . . . 21
3.3 Overview of Lambda Architecture . . . . . . . . . . . . . . . . . . . 22
3.4 Architecture for Social Internet of Things (SIoT) Client Side and Server
Side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1 Hosting data on cloud and challenges . . . . . . . . . . . . . . . . . . 27
iv
8. Acronyms
AMA American Medical Association
ANN Artificial Nueral Network
API Application Programming Interface
BD Big Data
BDA Big Data Analytics
BDSNA Big Data Social Network Analysis
BI Business Intelligence
BP Batch Processing
CC Cloud Computing
CO Cognative Objects
DARPA Defense Advanced Research Projects Agency
DB Bata Base
DD Deep Data
DM Data Mining
DMO Destination Management Organizations
DT Decision Tree
DW Data Warehousing
eWOM e-word-of-mouth
FB Facebook
FC Fog Computing
IoT Internet of Things
v
9. LA Lambda Architecture
NLP Natural Language Processing
NSA National Security Agency
OI Operational Intelligence
OM Opinion Mining
RFID Radio Frequency Identification
ROS Robotic Operating System
RTA Real Time Analysis
RTBDA Real Time Big Data Analytics
SIoT Social Internet of Things
SM Social Media
SNA Social Network Analysis
SNs Social Networks
SP Stream Processing
SW Software
TAM Technology Acceptance Model
TM Text Mining
TPA Technosocial Predictive Analytics
UGC User Generated Content
US United States
USA United States of America
WSNs Wireless Sensor Networks
WWW World Wide Web
vi
10. Chapter 1
Introduction
This literature survey is based on key domain areas that "Social Network Analysis"
play a vital role with use of "Big Data" technologies. The discovered knowledge can
be utilize to extend current status of respected domains. This chapter highlights
importance, history and growth potentials in the survey topic in a nutshell.
1.1 Approach
Social Networks(SNs) connect people with different ideas, education, status, back-
groudns, geographies etc... The focal idea of Social Network Analysis(SNA) is iden-
tifying network relationships within the network. Information diffusion is the key
behind relationship formation. Within SNs, variety of interest are sharing that adress-
ing different domains, and it forms complex relationships. With World Wide Web
(WWW) and Web 2.0, SNs have gained a new shift and focus. Online SNs are massive
data repositories. Visitors to SNs leave a digital footprint once they are logged in and
hence all activities of logged users can be examined in online SNs. Data scientists
found the importance of translating these technological opportunities into revenue,
competitive advantages and useful discoveries to redefine human interaction[6] and
day to day life. Otherwise the data would have remained in data tombs and oppor-
tunities would have been ignored.
A trusted technique to analyse SNs are BD analytical approaches. Data Mining
1
11. (DM) techniques are heavily used to dig deeper into data in SNs.Big Data Analytics
(BDA) is a proven method of defining new storage/access/query/scaling mechanism
of data and of developing new approaches to sentiment analysis, predictive modeling,
Natural Language Processing (NLP), click stream pattern recognition etc. . .
BDSNA is a fast growing research area. There are quite a number of algorithms,
software tools and analytic engines that are optimized[39] for BDSNA. These tools
are capable of gathering data, processing, analyse and present results visually for a
particular domain. This literature survey gives an overview review on BDSNA topic
as published on research papers, journals, web articles, books etc...
1.2 Motivation
“Connectivity” is the concept for forming SNs. Competencies given by SNs, sen-
sors, online networks are rich data sources. People spend a substantial amount of time
in online networks.Therefor SNs generate high volume of User Generated Content
(UGC) with different varieties at a rapid velocity[40]. This UGC is a true reflection
of human behaviour in SNs hence UGC in SNs are of high commercial value. But it’s
enormity and unstructured nature has presented multiple challenges, hence the need
for storage, access, analytics and high computational performance needed to con-
sider. As a result “BD” technology mix been with SNA to discover new diamensions
of knowledge.
Facebook (FB)1
, Twitter2
, LinkedIn3
, Google+4
, Tripadvisor5
, Blogger6
, Insta-
gram7
are the leading SNs with vast user engagements in todays context. UGC in
SNs are in the form of text, emoticons, images, ratings, likes etc. . . and address
many domains such as travel and tourism[33], defence and security[4], healthcare
and medicine. Nature and characteristics are different in these SNs, how ever there
are similarities which can aggregate in addressing domains.UGC poses many busi-
1
https://www.facebook.com/
2
https://twitter.com/
3
https://www.linkedin.com/
4
https://plus.google.com/
5
tripadvisor.com/
6
https://www.blogger.com/
7
http://instagram.com/
2
12. ness opportunities. Discovery of knowledge that are resides in UGC while analysing
attributes that are unique to each domain will create more opportunities for both
private and government sectors. Another big wave in the coming decade is IoT. This
will further create more UGC in semi structured and unstructured nature, and vari-
eties of SNs. As a result “BD” will move to “Deep Data (DD)” concept while, “Cloud
Computing (CC)” will move to “Fog Computing (FC)”.
Deducing business intelligence via connecting dots using Operational Intelligence
(OI) and comparing and applying discovered knowledged in the modern and future
societal context using classification, sentiment analysis and other techniques in BD
and DM paradigm are blooming research topics. In these researches areas, it is
integral to determine which BDSNA algorithms and techniques have accommodation
for growth in size, scalability, quantification issues, pattern recognition issues and
capability of real time analytics in SNA application areas. Data scientists and other
researchers are also seeking novel ways of redesigning the infrastructure to facilitate
BDSNA with the rapid growth in IoT.
1.3 History
Various arguments are there to claim the initiative on SNA, while an experiment
done by Stanley Milgram in 1967 provide proper groundings for it. He came up with
“six degree of separation”[37] concept where he stated that most people connected
by six acquaintances. SixDegrees.com was the first acceptable online social network.
The research arena BDSNA boomed with Web 2.0 that came to light in 1999 [46].
Low availability of internet facilities and lack of Software (SW) tools to meet BD
requirements,were a major reason for BDSNA to stay out of sight in early days.
1.4 Current status
Today millions of people are connected with social networks in many different
ways[28]. Social networks are in a neck to neck fight to keep their current users while
attracting new users. This leads to semistructured and unstructured data being
generated at a rapid pace.
3
13. BDSNA is an aggressive and lucrative research areas in modern computer sci-
ence. Public and private sector organizations have open up their data repositories
for research purposes[13] and have encouraged data scientists to actively engage in
more research areas in BDSNA. Tech giants like Google8
,Microsoft9
, FB, Amazon10
and IBM11
are investing in start-up companies that operate in BDSNA because of its
lucrative nature and growth potentials. The demand for business intelligence tools
are erupting[10]. High performance, low latency, parallel distributed processing, real
time processing, scalability, migration are factors that are continuously optimized in
such tools. Further with IoT a new era has been born where trees tweet on their
conditions[12][15].
1.5 Chapter summary
Early days BDSNA was not so popular due various reasons and it has emerged
with Web 2.0 technology. SNs connect people with different views and opinions.
The UGC data repositories of SNs are huge and those are in different varieties and
variations.BDSNA helps in analysing UGC in SNs and there by discover knowledge.
This knowledge has a higher commercial value as well. Today, there are different
forms of online SNs that address different user groups (FB, LinkedIn, TripAdvisor
etc...). Classification, sentiment analysis, clustering, Real Time Analysis (RTA) and
various other BD and DM techniques are widely used in SNA. Today, the advances in
technology has spread to SNA where now tech giants and data scientists are looking
for novel approaches to accommodate the needs of SNA such as storage, querying,
accessing and analysing UGC with much more improved technologies.
Next chapter gives a detailed illustration on four major SNA domains that this
literature survey mainly concerned with. Examples and use cases from survey reports,
articles journals have included in order exploring the BDSNA importance to respective
domains.
8
https://www.google.com/
9
http://www.microsoft.com/
10
http://www.amazon.com/
11
http://www.research.ibm.com/
4
14. Chapter 2
Big Data Social Network Analysis
Domains
In this chapter, SNA domains in health care, defence and security, travel and tourism,
web 2.0 and IoT are discussed.Examples illustrate how BDSNA have used to address
stakeholder intensions and expectations. Further, this chapter exposes specializations
in each domain that emerged as a result of BDSNA.
2.1 Health care
As shown in "Figure 2.1", it is apparent that there’s a strong likelihood to use
internet as a source of finding health and wellness related information and people are
more likely to spend much time in SNs in their day to day lives. Web 2.0 attracts users
of all age groups. Discussions, information diffusion, collaboration over SNs growing
so rapidly in healthcare space. Recent researches have identified that professionals in
healthcare are willing to use SNs as means of addressing their patients and monitor
health conditions of patients. Further, patients who have recovered are also inter-
ested in sharing their success stories in SNs in the forms of blogging, photo sharing,
video uploads and articles. This information is publicly available to a vast variety of
people. As of now, we are in Health 2.0, “the use of social software and its ability
to promote collaboration between patients, their caregivers, medical professionals, and
5
15. other stakeholders in health”[17].
Figure 2.1: Sources Used to Find or Access Health and Welness Related information
in 2008, in USA
It is “Collective Wisdom” that act as driving force for people to increasingly use
SN to find information relevant to their health matters. There are specifically devel-
oped SNs like PatientsLikeMe1
,OrganizedWisdom2
,ICYou3
, Google Health Groups,
Sermo4
, DailyStregth5
to bridge the knowledge and experience of patients and health
care professional expertise[17].American Medical Association (AMA) emphasizes the
importance of adhering to professionalism to physicians, neurosurgeons and other
professionals, when publishing content over SN to safeguard career status in health-
care background[9]. Even though there are challenges in collecting data, healthcare
sector in SNs reflects accurate data where it is over 99.7%[20].
1
http://www.patientslikeme.com/
2
http://www.organizedwisdom.com/Home
3
http://icyouhealth.tumblr.com/
4
https://www.sermo.com/
5
http://www.dailystrength.org/
6
16. A research focussed on cancer patients social behaviour on FB conducted by the
University of Texas M.D Anderson Cancer Center has enabled them to provide better
service towards its patients. The UGC had been of poster types and text. This tech-
nique is called “Telemedicine”. Just as Health 2.0, Medicine 2.0 is another concept
that evolves with high user participation over sn to communicate and collaborate on
health care. The Twitter network is also widely popular among patients and health-
care professionals as a medium of communication[20]. How ever patients willingness
to communicate over SNs openly is mandatory, otherwise regulations will mark it as
a violation of patients’ rights.
Videos, articles, comments, chats, images and other form of UGC related to health-
care available on SNs represents a gold mine of opportunities[20][17]. Sophisticated
applications have been developed integrating both DMand BD techniques. TrialX6
is
one such application that patients can use. Once a patient tweets, TrialX will send a
tailored response to the patient from his/her past health history[20][32].
Gene engineering, drug research, disease research and public health domains utilize
UGC on SNs to discover knowledge and thereby develop models to enhance health
conditions of people. Twitter hashtags are quite useful when determining disease/drug
related effects[8]. Automated filtering system that was developed by US Food and
Drug Administration has proved that 98% of tweets are bogus, however the true
information is of great value[23].
Information extraction is critical. Migration of digital documentation from paper
work, and SNs data are huge repositories. An automated surveillance system would
be much effective in information extraction, analysing data and recognizing patterns.
One such system implemented at University of Alabama has proven results. It had
been successful in determining, high risk patients, short-term health issues and ad-
verse effects from drugs. Use of big data has enabled to deliver tailored prescriptions
for patients[30].
Significant number of BD applications in healthcare domain exist today. SW tool
that is similar to Asthmapolis7
would be meaningful to implement considering SNs
data repository. Mobility is expected through big data tools hence mobile platforms
6
http://trialx.com/enablers/
7
http://propellerhealth.com/
7
17. enable tools will have a lot of growth potentials.Ginger.io8
and mHealthCoach9
are
the leading tools[18] at present but these two have been unable to incorporate SNs
domain into there applications, and the necessity for such tools prevail.
2.1.1 Challeges and Future
UGC appear on anonymous blogs and spam comments are unreliable sources.
Efficient NLP techniques and Text Mining (TM)techniques need to be utilize when
developing BD tools and appliations. Strong rules and regulations exist in healthcare
domain. This is a barrier to obtain useful information from SNs. Mere sentiments are
not enough to develop solid algorithms and models, patient information and other
related information will add much value to researches. "Privacy" concerns are another
barrier. People might not want others to use what they share on SNs
Web 2.0 will evolved to Web 3.0 and eventually Health 2.0 and Medicine 2.0
will evolved to Health 3.0 and Medicine 3.0[17]. With the rise of IoT BD wearables
will take piority in healthcare[7]. SNs BD wearable concept will redefine human
interactions with healthcare matters.
2.2 Defence and Security
With the 9/11 massacre in the United States (US), the National Security Agency
(NSA) invested a huge amount of resources to counter attack terror networks. “Net-
works and Networs” by John Arquilla and David Ronfeldt prior to 9/11 massacre
highlighted the network behavioral patterns of criminal networks. Modern war net-
work structures are leaderless, extremely quick hence novel approaches are needed in
counter terror threats. Valdis Krebs mapped Al-Qaeda network responsible for 9/11
[37]. More and more importance was given in SNA to trace terror network and to-
day SNA plays a key role in demolishing terror networks[11].Technosocial Predictive
Analytics (TPA) methods for web DM, social web tools needed to capture and query
UGC in SNs[22]
8
https://ginger.io/
9
http://www.mhealthcoach.com/
8
18. National security is the main concern. Unlike other SNs domain, defence domain
is different in many ways since key players are not openly active. Weakly tied parties
are somewhat open in SNs, but even they hardly communicate. SNA in defence
requires two major parties, data collectors and data modellers.Data collectors face a
cumbersome time in gathering data due to the above reason. University of Arizona
Artificial Intelligence Center10
offers large data repository of newspaper articles, web
pages, social network data that is terror related.Clustering technique have been used
to segregate possible terror networks and they have managed to pictorially represent
diffused networks linked with weak ties(Figure 2.2) in network of 9/11 attackers[37].
Figure 2.2: 9/11 attackers having weak ties with others
2.2.1 Identifying key players in network
Two main focuses of analysing SNs in defece domain are to identify structure
of possible networks and to recognize key players. With 9/11 attack, the structure
decentralized (yet still both centralized and decentralized networks do exists). Un-
derstanding key player will help in taking the control of the entire network. Though
10
http://ai.arizona.edu/research/terror
9
19. it sounds easy, factors such as incompleteness, fuzzy boundaries and dynamics makes
it a tough task. In a decentralized network player do exists to handle financial as-
sistance and other supplies while the leader plays a silent role in managing[4](Figure
2.3). BDSNA is use to identify financial manager and there by recognize key roles.
Twitter BD analytic techniques are most likey to be used in recognizing key players.
[31]
Figure 2.3: Decentralized terrorist network
As shown in "Figure 2.4", PISTA Architecture is quite useful in filling major loop
holes in national security domain. But at the moment, this architecuture has fewer
applications with SNs UGC integration. It is highly recommend to invest on extending
the functionality of PISTA architecture to supportBDSNA in security domains since
most SNs have video sharing, geo-location setting features in them[42].
Figure 2.4: PISTA ontology
10
20. 2.2.2 Usecases from recent history
During recent history there had been several major incidents happening through-
out the globe whith Web 2.0 initiatives. This section highlights some major incidents
and BDSNA technologies used in those situations.
• 2008 Egyptian Revolution started through an initiation of FB group.Importance
of giving attention on SNA was discovered[14].
• 2009 Pakistan Chief Justice restatement efforts were caused purely due to SNs
influence. Government banned private media, yet people did social awareness
through SNA so Govt had to restate the Chief Justice back in his position [14].
• ISIS is a technologically sophisticated terror group that actively engage in SNs.
ISIS use strong encryption techniques when communicating over SNs. Due to
this barrier,BDSNA approaches like NLP,Graph data bases (determine hierar-
chies and identities) and cognitive computing platforms cannot solely be used
as they are. Project Minerva by Department of Defence USA, utilize high end
algorithms to determine terror activities that are pulled from Twitter.[47]
• FIFA World Cup 2014 can be considered as an event that used BDSNA to
establish peace around grounds and nearby cities. Brazil securities used real
time Twitter feeds, FB feeds and other SNs UGC and analysed semantics to
determine where to send troops to control riots. Security agencies used powerful
BD solution, Oracle Complex Event Processor11
to do real time querying on SNs
feeds.
2.2.3 Challenges and Future
BD analytics in defence sector provide meaningful insight to Governments. The
director of Defense Advanced Research Projects Agency (DARPA)12
in US empha-
size the importance of algorithm optimization in discovering useful intelligence. “e-
harassment”, “cyberbullying”, “hacking” are major investigating areas.The adoption
of SNs data is yet at a low stage, but considering recent history it is apparent that
11
http://www.oracle.com/technetwork/middleware/complex-event-processing/documentation/index.html
12
http://www.darpa.mil/default.aspx
11
21. it is highly essential to take into consideration SNs data when discussion the secu-
rity domain. Big argument against BDSNA in defence is, violation of privacy. People
share their thoughts on FB, Twitter and other SNs because they have a right, and not
to use those for other purposes. Recent whistle blowing incidents by Julian Assange
and Wikileaks, PRISM and Edward Snowden are such examples. It is apparent that
the Government try to hide these information from public visibility[11]. To obtain
successive results there should be a balance between Govt policy towards SNs and
users attitudes.
2.3 Travel and Tourism
Tourism has always been a networked industry. Web 2.0 redefined tourism and all
related industries. This phenomenon is Tourism 2.0[26]. In tourist networks, two
major types of stakeholder (tourist, travel agent, accommodation providers, restau-
rants etc. . . )[41] can identify tourist and service providers. Different views have been
given to BDSNA in the domain of tourism. Two such broad views are using SNs as a
tool in tourist destination determination [33][26] and second is process and discover
interesting patterns in SNs and apply derived knowledge to tourism[34][16].
2.3.1 Web 2.0 forms Tourism 2.0
SNs are powerful tools that uses Technology Acceptance Model (TAM) and e-
word-of-mouth (eWOM). TAMillustrates users’ willingness to adapt to technologies
while eWOM is content sharing on SNs in forms of text, images, videos etc... TAM
and eWOM provide primary source of information for cybertravellers. Cybertrav-
ellers behaviour depend on what other people say about destinations(Figure 2.5).The
need for new framework to address destination governance is highlighted in this ap-
proach. Service providers need to adopt their networks with features of embedding
SNs to support searching, visualization, interactivity and this would trigger positive
attitude towards travelling. Travel 2.0 SNs (TripAdvisor, WAYN13
, Tripwolf14
, Trav-
elblog15
, Trivago16
)SNs features to address cybertraveller expectations.Here focus is
13
http://www.wayn.com/
14
http://www.tripwolf.com/
15
https://www.travelblog.org/
16
http://www.trivago.com/
12
22. more towards leisure travellers rather than business travellers.[33][26][35][36]
Figure 2.5: most consulted SNs in cybertravelling
2.3.2 Tourism 2.0 Destination Management
Tourist attitudes, behaviour and psychology has huge impact when determining
destinations to explore. Different market segments demands are different. eMar-
keteers use tailored strategies to attract potential tourists... Destination Manage-
ment Organizations (DMO) utilize DM and BDSNA techniques (clustering, Artificial
Nueral Network (ANN), Decision Tree (DT)) to determine customer intentions from
mixture of facts and opinion from UGC on SNs[43].
Travel 2.0 benefit BDSNA in demand/sales forecasting, inventory management,
multichannel marketing campaign organization etc... Use of SNs methods are quite
important when removing noise and discover meaningful knowledge from SNs to bring
meaninguful insight[1].RapidMiner17
analyse traveller patterns and render dynamic
personalize suggestions based on past as well as other linked networks results (pre-
dicting air ticket price, hotel charges etc. . . [44]
At the time of decision making, traveller in a state of switching one to the other
depending on reviews. FB pages, provide great insights about destinations/hotels.
How ever researches have proved that it is very much likely to Tweet or post on FB if
travellers had a bad experience with service provide organizations.Twitter users are
more likely to re-Tweet negative reviews than positive reviews. This highlight the
importance of monitering UGC on SNs pages of service providers. Once the traveller
has selected preferred hotel/travel service, they are very much likely to visit brand
17
https://rapidminer.com/
13
23. Figure 2.6: Traveller recommendation system
website of hotel/travel agency. It is vital to integrate TAM features to explore more
about services that are offered to customer to win customer loyalty[36].Strategies help
DMO ultimately to boost their revenue and gain competitive advantage over peers
that ignores BDSNA.
2.3.3 Challenges and Future
This sections describes prevailing restrictions in BDSNA in tourism and travel
domain and how the future would be.
• Currently, most are relationald BDs. Tourism and travel sector need new in-
frasturture tools to get maximum of bdsna.
• User opinions are subjective. Algorithms should support the viewing of gener-
alized opinion of travellers and should not be affect it by outliers.
• Content that shares on FB, Twitter and other SNs have direct influence on
DMO, travel agents and hotels.So there is a need for strong monitering mecha-
nism need to incorporate to Travel 2.0 websites.
• Airline service providers can benefit from real time data analytics on flight
delays, UGC from SNs, and sensor data (weather patterns etc. . . ) serve greatly
when optimizing operations.
14
24. Figure 2.7: TAM to gain loyalty
2.4 Web 2.0 and IoT
In an era where Web 2.0 evolves to Web 3.0 (Ubiquitous Computing), that hard-
ware embedded software takes the lead in daily routines of mankind, will have a huge
influence on current SNs practices as well. Today mostly humans are connected to
SNs. With advances in IoT, Cognative Objects (CO) or smart objects are capable of
sharing UGC over SNs. Tweeting trees(Figure 2.8), tweeting washing machines send
real time content to humans[24][27]. Two broad SNs exists with SNs, humans to CO
SNs and CO to CO SNs SIoT[19].
Figure 2.8: Tweeting trees
15
25. Developers integrate SNs capability to every smart device because SN play an
important role in personal life. Google Glass18
, Samsung Galaxy Gear watch19
, Apple
iWatch20
and many other wearable technologies have integrated SNs capability. Lewis
Robinson on his article to SocialMediaToday21
stated that “iWatch will check in for
you via Facebook when you arrive at an event. Your oven will take a photo of the
cake you just baked and post it directly to Instagram”.[38] It is evident that automated
interconnected smart devices can act without human intervention.
There will be more data as neverbefore. BDSNA will be able to provide more
personalized information to all stakeholder groups, and advanced Business Intelligence
(BI) can be derived using sophisticated analytical approaches. Concept of “Smart
Cities” is an example of advanced data analytic utilization of UGC from IoT devices
that connected SNs and other CO. Waze22
, is such real time traffic application that
connect mobile devices with other CO (traffic lights, street signs etc. . . )
2.4.1 Challenges and Future
“Privacy” is again a major concern in this arena. Since devices having capability of
generating automated content sharing on SNs, it could be a violation of privacy of in-
dividuals. How ever, Lawrence Ampofo on his recent article to Business2Community23
emphasizes that, “conception of privacy become more sophisticated” where people are
more likely to openly communicate their personal life through social networks and
“data to be more liberated from wall gardens making available to all platforms”[2].
It is predicted that by the end of 2020, the number of IoT devices would rise
above 50 billion[38]. The potential for new concept SNs is massive. The amount of
unstructured data that is generated from IoT devices will be so huge that even current
bd technologies cannot accommodate the size, growth and scalability. The concept
of “Deep Data” and “Fog Computing” need to be utilized effectively to accommodate
infrastructure requirements.
18
https://www.google.com/glass/start/
19
http://www.samsung.com/uk/consumer/mobile-devices/wearables/gear/
20
https://www.apple.com/watch/
21
http://www.socialmediatoday.com/
22
https://www.waze.com/
23
http://www.business2community.com/
16
26. 2.5 Chapter summary
Health 2.0, Medicine 2.0 approaches have evolved as a result of Web 2.0 because
it identified that, the potential from SNs to health care industry is massive. SNs are
fastest method of communication between patients and health care professionals such
as nurses, doctors and specialists etc. . . In PatientsLikeMe, Google health groups
and various other SNs that specially focus towards health care are sharing knowledge
and experiences of all parties related to health care. Sophisticated SW tools such as
TrialX utilizes BDSNA methods to send tailored responses to patients and doctors by
analysing related party data reflected on SNs. Specialized research areas such as drug
research and disease research massively use BDSNA approaches like TM, sentiment
analysis, clustering and RTA etc. . .
Defence and Security domain is very different compared to other domains in BD-
SNA. Finding reliable data repository is a major challenge because terror groups
hardly reveal any data. But recent ISIS scenario is totally different. Today, gov-
ernment agencies and authorities use BDSNA to establish security in their territory.
RTA play an important role in analysing UGC of SNs. Highly sophisticated models
and predictive algorithms have developed using BDSNA mechanisms.
With Web 2.0, Travel 2.0 evolved. UGC that are in form of text, video and
images etc. . . are useful resources for discovering traveller psychology and behaviour.
Business models like TAM were developed as a result of BDSNA . Hotel owners, travel
agencies are using BDSNA approaches in addressing their customer requirements.
RTA and recommendations are heavily use in Travel 2.0.
IoT has paved the way for living things like trees and non-living objects such as
washing machine to share their status over SNs. As a result of smart devices being
part of SNs, the amount of data that is generated, that is of unstructured and semi
structured are unbelievable. This pushes data scientists to explore new technologies
like FC and DD to integrate to BDA.
Third chapter focuses on core BDA technologies in SNA. Network visualization,
data storage, process, accessing, recommendations systems and RTA that discussed
in above SNs domains are illustrated in technically and theoretically.
17
27. Chapter 3
BDSNA Tools and Technologies
In this Web 2.0 era, data is generation is exploding exponentially and data scientists
and IT professional are highly ambitious in turning BI to an asset in their busi-
ness domain.This chapter illustrate, key concerns and core technologies and tools in
BDSNA.
3.1 Major Concerns in BDSNA
This section highlights identified issues from previous chapter in a nutshell.
• Security and Privacy: Most UGC on SNs reflect people’s personal life moments.
All scenarios we considered in the last chapter highlights security and privacy
as a major concern[11][2].
• Explosive growth rate: With growth of Internet and IoT will generate more
UGC. Infrastructures should accommodate to store, process, capture and anal-
yse new sources of semi-structured and unstructured data from all SNs. FB
uses Apache Hadoop1
and Apache Hive2
for storage purpose because hardware
scalability is high, and Scribe3
as a log collection strategy[45].
1
http://hadoop.apache.org/
2
https://hive.apache.org/
3
https://github.com/facebookarchive/scribe
18
28. • Extract valid UGC removing noise: TM, NLP and other DM techniques need
to optimize to find validity of data[2].
• Real time analytics: Need for Stream Processing (SP) is erupting. User data
gathered over a period will go through Batch Processing (BP) machanism to
develop models to check and analyse incoming events in real time.
• Sophisticated analytics tools and SW: Low latency and more visualization is
expected from BDSNA tools and SW. The Lincoln Laboratory is currently
engage in research projects to develop sophisticated algorithms and software
tools to generate networks from unstructured/semistructured data[10].
To represent different user groups in SNs wide range of tools and SW are avail-
able in the market place. When considering selecting the right tool, factors such
as, intended goal, ease of use, operating platform, cost effectiveness etc. . . need
to be taken into consideration. Out of all these “visualization” capability is vi-
tal.Streanghts of network ties, user groups structures, and dynamics can be viewed
using these tools.[29].
Tool / SW Description
Gephi4
Platform independent SW that is distributed under open source
licence. Good tool in visualizing networks and their relationships.
NetLogo5
Free software that supports platform independency. Helps in visu-
alizing dynamics in network formation. Study of network behaviour
can be done using this tool.
iGraph6
Free SW that can be used to perform heavy calculations.
Pajek7
Another free SW that runs only on Windows platform. Network
formation, dynamics, information diffusion and many other inbuilt
feature.
UCINet8
Commercial SW that supports only the Windows platform.
NodeXL9
Fairly new to market. SNA can integrate with Excel. Free SW and
for the moment only available for Windows platform.4
http://gephi.github.io
5
https://ccl.northwestern.edu/netlogo/
6
http://igraph.org/
7
http://pajek.imfm.si/doku.php
8
https://sites.google.com/site/ucinetsoftware/home
9
http://nodexl.codeplex.com/
19
29. NetworkX10
A good tool in programming perspective. Has developed using C
and Fortral libraries. Optimized for scaling for large matrices.
Nuero productions 5K Twitter browser and Neofomix Twitter Stream Graph are
advanced visualization tools that can be used to analyse UGC from Twitter.[24]
3.2 Real Time Analysis
FB, Twitter, LinkedIn, Goolge+, TripAdvisor and all leading SNs provide real
time visibility on what their users prefer. Intel BD Research Center forecasted that
the uses cases for Real Time Big Data Analytics (RTBDA) will spread towards more
in SNA than BP, yet BP will still act as the core for RTBDA. Real time analytics
based on SP.OI[39] and Lambda Architecture (LA) are the core BDA technologies
that SNs mainly use for RTBDA.
RTA explanation
RTBDA is an advance technique to make better decisions and meaningful actions
at precise time. There are two major important aspects in RTA. Real time actions
are treated as “streams of events” in RTA. To determine the required action to be
performed when an event comes to the system, the system need to capture, pro-
cess and analyse the parameters and attributes in the incoming event stream, and
determine the corresponding stream category or group with regard to application do-
main. Then the corresponding categories stream would match with an action that
is determined by pre defined model.It is important to develop this “model” at first
phase in RTA. Further more the RTA engines are stateless engines, in that it doesn’t
require provisions for previous incoming streams in determining action for current
stream[25].
10
http://networkx.lanl.gov/
20
30. Figure 3.1: Expected growth in real time analytics by 2015
Figure 3.2: Capabilities of Operational Intelligence
FB, Twitter, and other SNs use data records that are collected over a large period
of time. Model is developed considering the nature of the application domain(i.e.
tourism, healthcare etc...), not the individual records that reside in data repositories.
OI and LA are core technical approaches in designing and developing RTA engines.
21
31. 3.3 Lambda Architecture
LA,developed by Nathan Marz,achieves the capability of real time processing by
decomposing the event into three layers, batch layer, serving layer and speed
layer. Everything starts from query = function(all data) equation[5]. The computa-
tional cost is highly expensive for to perform this function for every event on the fly.
In batch view, a precomputed query function will be used to check the result for
the query instead of calculating on the fly. The precomputed view is indexed so that
it can access fast with few random reads.
Figure 3.3: Overview of Lambda Architecture
3.3.1 Batch layer
Batch layer acts as the master holding the values of batch views that are computed
on master data set (HDFS) and compute arbitrary views (MapReduce)[25]. This
master data set domain can be either historical data or historical data with current
data (depend on business domain and key stakeholder interest). Apache Hadoop is
used to process master data set and develop required model.
simplest pseudo code for batch layer[25]
function runBatchLayer():
while(true): // repeatedly recompute batch views from beginning
22
32. recomputeBatchViews()
3.3.2 Serving layer
Real time querying is supported by the serving layer. Real time stream is ingested
into the analytic engine and inside the engine, stream is processed, then the corre-
sponding action is triggered. Apache Drill11
and Cloudera Impala12
are SP engines
that are used to implement serving layer functions[25].
3.3.3 Speed Layer
There is a substantial latency in BP, and the impact is compensated via dis-
tributed SP. Apache Storm13
and Apache S414
are used to implement this layer[25].
3.4 Recommendation systems
FB, Twitter, LinkedIn, Goolge+ and all leading SNs .These systems apply knowl-
edge discovery techniques to the problem of making personalized recommendations
during a live interaction[21].
ex: Consider a scenario where you add a friend on FB and FB will automatically
give similar recommendations. (a generalized recommendation system)
Recommendation engine analyse people who add the same person that you add,
and from those people(1), the engine analyse and determine other people(2) who are
added by those people(1). System will give people(2) as our recommended people to
add and expand our network
11
https://github.com/apache/drill
12
http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html
13
https://storm.apache.org/
14
http://incubator.apache.org/s4/
23
33. SNs recommendations are determined by the number of Likes, clicks, user rat-
ings and emoticons. The algorithms are mainly of two categories, content-based
algorithms and collaborative filtering algorithms. Content based algorithms
check similarity of target item (recommended). Collaborative filtering technique will
use, previous similar recommendations based on clicks, ratings etc. . . Additionally
time window technique is adapted to give recommendations according to time du-
rations.(Google+ and Twitter trends etc...
3.5 Web 2.0 IoT Architecture
Distributed Wireless Sensor Networks (WSNs) to share data, Robotic Operating
System (ROS) as middleware platform and Radio Frequency Identification (RFID)
as an identification technology, provide the core architectural infrastructure to CO
to recognize activities and at the same time incorporate knowledge to smart objects.
Pachube platform15
provide fundamental API groundings for developers to develop
SIoT.
Figure 3.4: Architecture for SIoT Client Side and Server Side
It is important to understand SIoT network characteristics and relationships when
designing and developing smart environments. Four main types of relationships are
exists in SIoT networrks[3]
• parental object relationship: This is a family like structure that believes CO do
share similar characteristics with devices that are developed during the same
time period (argument here is that, technology changes so rapidly)
15
http://datahub.io/dataset/pachube
24
34. • co-location object relationship:Object relationships needed to established dur-
ing the design and development of smart environment, based on location base
inforamation.
• ownership object relationships: One person can be owner of several CO. This
ownership information is vital when interacting with SNs of CO.
• social object relationship: Devices with similar characteristics can share best
practices to solve issues. “Cloud-of-cloud” concept is a broad view that shares
same idea. This idea can relate to edge computing IoT devices.
3.6 Chapter summary
Security and privacy need to give a great deal of attention when designing and
developing BDA SW and tools as well as developing algorithms to SNs domain. In
SNA, “visualization” is an important aspect to look at when designing SW that can
analyse different user groups and Gephi tool out performs other network visualization
tools.
Lambda Architecture that uses BP and SP, utilize for RTA in BDSNA. OI uses
in order to develop models that can be used in RTA. Recommendations systems are
differ from domain to domain and use quite a number of user actions such as user
click, likes and ratings etc. . . in designing algorithms.
Next chapter is the final chapter that summarizes literature survey and it gives
future directives to BDSNA domains widening current status to a new level.
25
35. Chapter 4
Conclusion, Challenges and Future
Directions
This chapter summarises overall survey and provides insight into future directions for
SNs in applying gathered knowledge in practice.
4.1 Conclusion
Today even tech averse and less techy people do have an understanding about
SNs (like FB), but they are hardly aware of what search engines can do.Children,
youngsters, adults and even old people are making their presence felt in SNs. People
are eager to share their personal life stories, and on the other hand people like to peep
into other peoples’ affairs. Interesting fact is that, not only humans, but also other
living and non-living objects are becoming users of SNs. The highly dynamic UGC
on SNs reflect user perspectives and feedback. UGC is not restricted to a particular
domain, it spreads to a vast variety of fields and BDSNA helps in addressing wider
range of stakeholder groups with higher degree of accurate BI.
This survey is focused on four major domains(Healthcare, Defence and Security,
Travel and Tourism, Web 2.0 with IoT) in SNs. To derive useful knowledge and
recognize hidden patterns from user activities of SNs, it is important to differentiate
26
36. what is exiting and interesting among all activities. BDSNA is the solution. BDSNA
has redefined these sectors to a new dimension making it worth for all interested
parties. Qualitative and quantitative results have been obtained through BDSNA, to
give a better service to users of SNs. Business strategies and models are creating to
satisfy the demands of users. Predictive models, recommendation systems and real
time analytics play a major role in today’s BDSNA.
Modern day BDSNA has been identified as a best approach as an answer to
many business domains. BDSNA has become an essential part of developing highly
sophisticated intelligence tools and SW.
4.2 Challenges and Future Directions
SNs like Facebook are considering cloud storage as a solution to accommodate
growing needs of data storage. As shown in "Figure 4.1", the biggest challenge in
adopting cloud storage that is identifies by all organizations, is security and privacy
violations. Even though a private cloud can provide security mechanisms to establish
more security, cyber attackers are smart enough to identify loopholes and thereby
spoil data on a cloud. It is evident that a s o f yet there is no 100% guarantee of
using cloud technology as a trusted service.
Figure 4.1: Hosting data on cloud and challenges
Most UGC on SNs are irrelevant to the considered domain. Incompleteness of
text information, multilingualism content, bogus user feedback are difficult to cater
27
37. to in doing genuine analytic. Deriving algorithms and strategies based on particular
geography user group is not sufficient. Data scientists need to give more attention to
these factors when doing SNA. Also TM and NLP are currently supported most in
micro blogging content (Tweets are limited to a maximum of 140 characters). These
techniques need to improve to a level where it can analyse much more text content.
Mechanism similar to YouTube real time translation is quite beneficial in SNs domain
context to spread awareness to wide range of users.
SNs have a huge impact on human behaviour and intensions, and it has challenged
the conventional behavioural patterns of humans over recent years. .FB can be used
to find a friend or relation and LinkedIn is a place to find professionals. It is apparent
that SNs play the role of a “search engine”. Integrating proper index methodologies,
would enhance search function of SNs and would give its users more accurate results.
Further, companies advertise their products and services on SNs. In the near future,
users will find it more compelling and attractive to use SNs for their online shopping
experiences. This highlights a big business opportunity for SNs like FB, but on the
other hand, a possibility for users to stay away from SNs may arise. The need for
shopping pattern analytics in SNs will also arise in the future.Like we have differ-
ent type of SNs now for different purposes (FB and LinkedIn), there will be more
categories of SNs in future. IoT will be a driving factor in diversifying SNs.
28
38. Bibliography
[1] Rajendra Akerkar. Big Data & Tourism Big Data & Tourism To promote inno-
vation and increase. 2012.
[2] Lawrence Ampofo. 5 ways the internet of things
will change social media, October 2014. URL
http://www.business2community.com/social-media/5-ways-internet-things-will-cha
Accessed November,2014.
[3] Luigi Atzori, Senior Member, Antonio Iera, Senior Member, and Giacomo Mora-
bito. SIoT : Giving a Social Structure to the Internet of Things. 15(11):1193–
1195, 2011.
[4] Ala Berzinji. Detecting Key Players in Terrorist Networks. 2011.
[5] Nathan Bijnens. A real-time Lambda Architecture using Hadoop & Storm
NoSQL Matters Cologne 2014 by Nathan Bijnens Speaker. 2014.
[6] Jaap Bloem, Sander Duivestein, and Thomas Van Manen. Big Social Predicting
behavior with Big Data.
[7] BloombergTV. Can wearables and big data cure disease?, August 2014. URL
http://www.bloomberg.com/video/parkinson-s-disease-new-ways-to-study-illness-V
Accessed November,2014.
[8] David Bollier and Charles M Firestone. The Promise and Peril of Big Data.
2010. ISBN 0898435161.
[9] Jeff Cain. Social media in health care: the case for organizational policy and
employee education. American journal of health-system pharmacy : AJHP
: official journal of the American Society of Health-System Pharmacists, 68
29
39. (11):1036–40, June 2011. ISSN 1535-2900. doi: 10.2146/ajhp100589. URL
http://www.ncbi.nlm.nih.gov/pubmed/21593233.
[10] William M Campbell, Charlie K Dagli, and Clifford J Weinstein. with Content
and Graphs. 20(1), 2013.
[11] Neil Couch and Bill Robins. BIG DATA FOR DEFENCE AND SECURITY.
[12] Paul M. Davis. A tree that tweets, September 2010. URL
http://www.shareable.net/blog/a-tree-that-tweets. Accessed Octo-
ber,2014.
[13] YOREE KOH DON CLARK. Ibm and twit-
ter forge partnership on data analytics, 2014. URL
http://online.wsj.com/articles/ibm-and-twitter-forge-partnership-on-data-analy
Accessed October,2014.
[14] Mark Drapeau and Linton Wells Ii. Social Software and National Security : An
Initial Net Assessment. (April), 2009.
[15] Rob Faludi. New york times on botanicalls, again!, April 2013. URL
http://www.botanicalls.com/. Accessed October,2014.
[16] Roberta Floris and Michele Campagna. Social Media Data in Tourism Planning:
Analysing Tourists’ Satisfaction in Space and Time Roberta Floris, Michele Cam-
pagna. 8(May):997–1003, 2014.
[17] California Healthcare Foundation. The Wisdom of Patients : Health Care Meets
Online Social Media. (April), 2008.
[18] Peter Groves and David Knott. The ‘ big data ’ revolution in healthcare.
(January), 2013.
[19] Dominique Guinard, Vlad Trifa, Friedemann Mattern, and Erik Wilde. From
the internet of things to the web of things: Resource-oriented architecture and
best practices. In Architecting the Internet of Things, pages 97–129. Springer,
2011.
[20] Carissa Hilliard. Social media for healthcare: A content analysis of md an-
derson’s facebook presence and its contribution to cancer support systems. of
Undergraduate Research in Communications, page 23.
30
40. [21] Jianming and Wesley W Chu. A Social Networ k-Based Recommender System
( SNRS ).
[22] Maged N Kamel Boulos, Antonio P Sanfilippo, Courtney D Corley,
and Steve Wheeler. Social Web mining and exploitation for seri-
ous applications: Technosocial Predictive Analytics and related tech-
nologies for public health, environmental and national security surveil-
lance. Computer methods and programs in biomedicine, 100(1):16–23, Oc-
tober 2010. ISSN 1872-7565. doi: 10.1016/j.cmpb.2010.02.007. URL
http://www.ncbi.nlm.nih.gov/pubmed/20236725.
[23] Deborah Kotz. Using twitter as tool to track
side effects from drugs, April 2014. URL
http://www.bostonglobe.com/lifestyle/health-wellness/2014/04/30/using-twitter-
Accessed November,2014.
[24] Matthias Kranz, Luis Roalter, and Florian Michahelles. Things That Twitter :
Social Networks and the Internet of Things.
[25] Nathan Marz and James Warren. Big Data principals and practices of scalable
real time systems .
[26] Roberta Milano. The effects of online social media on tourism websites. 2011.
[27] Mark Million. Washing machine twitters
when clothes are done, January 2009. URL
http://latimesblogs.latimes.com/technology/2009/01/twitter-washing.html.
Accessed November,2014.
[28] Alan Mislove, Hema Swetha Koppula, Krishna P Gummadi, Peter Druschel, and
Bobby Bhattacharjee. Growth of the flickr social network. In Proceedings of the
first workshop on Online social networks, pages 25–30. ACM, 2008.
[29] Chamin Nalinda. Social network analysis tools and softwares, October 2014. URL
http://techspiro.blogspot.com/2014/10/social-network-analysis-tools-softwares.
Accessed October,2014.
[30] Mary K Obenshain. Application of Data Mining Techniques to Healthcare Data.
(August):690–695, 2004.
31
41. [31] Onook Oh, Manish Agrawal, and H Raghav Rao. Information control and terror-
ism: Tracking the mumbai terrorist attack through twitter. Information Systems
Frontiers, 13(1):33–43, 2011.
[32] Chintan Patel. Now you can talk to twitter and
find clinical trials on trialx, December 2012. URL
http://trialx.com/enablers/2009/03/now-you-can-talk-to-twitter-and-find-clinic
Accessed November,2014.
[33] Loredana Di Pietro, Francesca Di Virgilio, and Eleonora Pantano. So-
cial network for the choice of tourist destination: attitude and be-
havioural intention. Journal of Hospitality and Tourism Technology, 3(1):
60–76, 2012. ISSN 1757-9880. doi: 10.1108/17579881211206543. URL
http://www.emeraldinsight.com/10.1108/17579881211206543.
[34] Angelo Presenza and Maria Cipollina. Analysis of links and features of tourism
destination’s stakeholders. an empirical investigation of a south italian region.
2009.
[35] Pslulfdo and Ehwzhhq. An Empirical Study on the Relationship between Twitter
Sentiment and influence in Tourism Domain. 2012.
[36] Cornell Hospitality Report, Laura Mccarthy, Debra Stock, Rohit Verma, D Ph,
Rod Clough, Gregg Gilman, Employment Practices, and Gilbert Llp. How Trav-
elers Use Online and Social Media Channels to Make Hotel-choice Decisions. 10
(18), 2010.
[37] Steve Ressler. Social network analysis as an approach to combat terrorism:
past, present, and future research. Homeland Security Affairs, 2006. URL
http://www.hsaj.org/?download&mode=dl&h&w&drm=resources%2Fvolume2%2Fissue2%2Fp
[38] Lewis Robinson. A tweet from your toaster: How the in-
ternet of things will affect social media, May 2014. URL
http://www.socialmediatoday.com/content/tweet-your-toaster-how-internet-things
Accessed November,2014.
[39] Philip Russom. TDWI Checklist Report: Operational Intelligence: Real-Time
Business Analytics from Big Data.
[40] Philip Russom. T DW I R E S E A R C H BIG DATA. 2011.
32
42. [41] Series, Chris Cooper, C Michael Hall, New Zealand, Noel Scott, and Rodolfo
Baggio. Network Analysis and Tourism From Theory to Practice.
[42] Amit Sheth, Boanerges Aleman-meza, I Budak Arpinar, Chris Halaschek, and
Cartic Ramakrishnan. Semantic Association Identification and Knowledge Dis-
covery for National Security Applications. 16(March):1–16, 2005.
[43] Sung-bum and Dae-young Kim. TRAVEL INFORMATION SEARCH BEHAV-
IOR AND SOCIAL NETWORKING.
[44] Sarawut Supattranuwong and Sukree Sinthupinyo. Applying Data Mining to
Analyze Travel Pattern in Searching Travel Destination Choices. pages 38–44,
2013.
[45] Ashish Thusoo, Zheng Shao, Suresh Anthony, Dhruba Borthakur, Na-
mit Jain, Joydeep Sen Sarma, Raghotham Murthy, and Hao Liu.
Data warehousing and analytics infrastructure at facebook. Proceed-
ings of the 2010 international conference on Management of data - SIG-
MOD ’10, page 1013, 2010. doi: 10.1145/1807167.1807278. URL
http://portal.acm.org/citation.cfm?doid=1807167.1807278.
[46] Tim O’Reilly. What Is Web 2.0. URL
http://oreilly.com/web2/archive/what-is-web-20.html.
[47] Alex Woodie. How big data analytics can help fight isis, October 2014. URL
http://www.datanami.com/2014/10/14/big-data-analytics-can-help-fight-isis/.
Accessed November,2014.
33