The document discusses big data analytics and related topics. It covers the evolution of technology, an overview of big data analytics including the 5 V's (volume, variety, velocity, value, and veracity). It also discusses research topics in big data, tools and software, literature surveys on various big data studies, identified research gaps, and a proposed activity chart and bibliography. The document provides a comprehensive overview of big data analytics, key concepts, potential research areas, and literature in the field.
1. Big Data Analytics
by
GOWRU BHARATH KUMAR
(Regd.No.121960304003)
Under the Esteemed Guidance of
Prof. G. Appa Rao
Professor
Department of CSE
GITAM Institute of Technology
GITAM Deemed to be University, Visakhapatnam
GOWRU BHARATH KUMAR
2. Agenda
ā¢ Evolution of Technology
ā¢ About Big Data Analytics
ā¢ 5 vās of Big data
ā¢ Research topics in Big Data
ā¢ Tools and Software's
ā¢ Literature Survey
ā¢ Research Gaps
ā¢ Research work activity chart
ā¢ Bibliography
GOWRU BHARATH KUMAR
4. About Big Data Analytics
ā¢ Anything beyond the human and technical infrastructure needed to support storage, processing, and analysis.
ā¢ Todayās BIG may be tomorrow's NORMAL.
ā¢ 5 Vās (Vās associated with Big Data may grow with time).
ā¢ Big data literally means large amount of data.
ā¢ Currently big data attains greater attention due to its immense need.
ā¢ Big data is the term for a collection of data sets so large and complex that it difficult to process using on-hand database
management tools or traditional data processing applications.
ā¢ Big Data is everywhere.
ā¢ Data Analytics is the process of seeking knowledge in the data, to make a better supported business decision through the
application of a methodology.
GOWRU BHARATH KUMAR
5. 5 vās of Big data
1) Volume
ā¢ Data at rest
ā¢ Bits-> Bytes-> KB-> MB-> GB-> TB-> PB-> EB-> ZB-> YB
GOWRU BHARATH KUMAR
6. 5 vās of Big data contā¦
2) Variety
ā¢ Data in many forms
ā¢ Different kinds of data is being generated from various sources.
1. Structured data: Data conforms to a pre-defined structure
Sources: Oracle, DB2, Teradata, MySQL, spreadsheets, OLTP etcā¦
2. Semi-structured data:
Sources: XML, JSON, Other Markup Languages etcā¦
3. Unstructured data: Data does not conform to any pre-defined data model.
Sources: web pages, images, free-form text, Audios, Videos etcā¦
GOWRU BHARATH KUMAR
7. 5 vās of Big data contā¦
3) Velocity
ā¢ Data in motion
ā¢ We have moved from the days of batch processing to real-time processing.
Batch-> Periodic-> Near real time-> Real-time processing
GOWRU BHARATH KUMAR
8. 5 vās of Big data contā¦
4) Value
ā¢ Data in money
ā¢ Mechanism to bring the correct meaning out of the data.
GOWRU BHARATH KUMAR
9. 5 vās of Big data contā¦
5) Veracity
ā¢ Data in doubt
ā¢ Uncertainty and inconsistencies in the data
GOWRU BHARATH KUMAR
10. Research topics in Big Data
1. Information Management
2. Business Analytics
3. De-duplication
4. Pattern detection
5. Data integrity & Quality
6. Data transformation
7. Legal and Regularity Issues and governance
8. Heterogeneity and incompleteness
9. Data Privacy & Ethics
10. Data visualization
11. Big Data security
12. Big Data issues
13. Processing, Storage and Management Issues
14. IoT with Big Data
15. Problems on Natural Language Processing
GOWRU BHARATH KUMAR
11. Research topics in Big Data Contā¦
Information Management
ā¢ Big Data Computing Platforms
ā¢ Big Data Computation
ā¢ Big Data storage
ā¢ Big Data Computational limitations
ā¢ Big Data Emerging Technologies
Business Analytics
ā¢ Descriptive Analytics
ā¢ Predictive Analytics
ā¢ Prescriptive Analytics
ā¢ Consumption of Analytics
Data Privacy & Ethics
ā¢ Privacy mainly defined as human values consists of 4 rights(Solitude, anonymity, intimacy and reserve)
ā¢ Data privacy focuses on the use and governance of individualās personal data like making policies to ensure that
consumers personal information is being collected, shared and utilized in right ways.
ā¢ Privacy equivalent to personal data or sensitive data
Ex: SSN, Bank Account Numbers, Credit Card Numbers, Personal Health Information
GOWRU BHARATH KUMAR
12. Research topics in Big Data Contā¦
De-duplication
De-duplication is a technique used to reduce the storage space needed by storing just pointers to the duplicate data instead of storing
the whole file.
Data visualization
ā¢ Visualizing data is a technique to facilitate the identification of patterns in data and presenting data to make it more
consumable.
ā¢ Charts, graphs and dashboards have been used for decades to synthesize data into a cohesive format for business analytics,
managers and executives.
GOWRU BHARATH KUMAR
13. Tools and Software's
ā¢ Hadoop - Open source framework used to store and process big data.
ā¢ Splice Machine - Used to derive real-time actionable insights.
ā¢ Mark-Logic - Deals with heavy data loads.
ā¢ SAP in Memory - Analyze of large workload of data.
ā¢ Cambridge semantics- Used to collect, integrate and analyse Big Data.
ā¢ MongoDB - Helps to have precise control over the final results.
ā¢ Pentaho - To visualize, analyze and blend Big Data.
ā¢ Talend - Used to improve the tool as the community tweaks.
ā¢ Tableau - Data visualization sphere which offer tools for developers.
ā¢ Splunk - Used to harness machine data created from different sources.
GOWRU BHARATH KUMAR
14. Literature Survey
Title 1: Improving Agility Using Big Data Analytics: The Role of Democratization Culture
Remarks:
ā¢ Big data analytics (BDA) is considered an enabler of organizational agility because it helps firms to sense market-based
changes and improve decision making in a more informed and timely manner. However, in reality, only a handful of firms have
achieved improvement in their outcomes by using BDA.
Future work:
ā¢ To address this inconsistency, our study explores the conditions under which BDA use translates into agility.
Title 2: Platforms oriented business and data analytics in digital ecosystem
Remarks:
ā¢ Platform-oriented business approach which is a prerequisite of the frontrunners for any noble organizations to keep customers
active, alive and compete the business in the digital age of information and platform focussed.
Future work:
ā¢ We focussed the study on how to concentrate the strategic revolution in financial sector with various econometric tools.
GOWRU BHARATH KUMAR
15. Literature Survey Contā¦
Title 3: Automatic Visual Recommendation for Data Science and Analytics
Remarks:
ā¢ Analysing datasets which has many attributes could be a cumbersome process and lead to errors.
Future work:
ā¢ The goal of this research paper is to automatically recommend interesting visualization patterns using optimized datasets from
different databases.
ā¢ It reduces the time spent on low utility visualizations and displays recommended patterns.
Title 4: Performance Analysis of Distributed Computing Frameworks for Big Data Analytics: Hadoop Vs Spark
Remarks:
ā¢ Performance, cost, ease of use, compatibility, data processing, failure tolerance, and security.
Future work:
For providing better support to software developers for dealing with big data problems, new programming platforms are
continuously developing In this research work, a comparative analysis of Hadoop MapReduce and Spark has been presented on the
basis of working principle, performance, cost, ease of use, compatibility, data processing, failure tolerance, and security.
GOWRU BHARATH KUMAR
16. Literature SurveyContā¦
Title 5: Big data analytics for smart factories of the future
Remarks:
ā¢ Continued advancement of sensors has led to an ever-increasing amount of data of various physical nature to be acquired from
production lines. As rich information relevant to the machines and processes are embedded within these ābig dataā, how to
effectively and efficiently discover patterns in the big data to enhance productivity and economy has become both a challenge
and an opportunity.
Future Work:
ā¢ This paper discusses essential elements of and promising solutions enabled by data science that are critical to processing data of
high volume, velocity, variety, and low veracity, towards the creation of added-value in smart factories of the future.
Title 6: Technologies and Issues in Big Data Analytics and Applications
Remarks:
ā¢ Analyzing, simply capturing a huge volume of data is impractical. Several reports can be prepared from this data.
ā¢ The process behind this reports preparation is also a challenging task for the software developers.
Future Work:
ā¢ This paper simply focused on several issues like challenges in data, challenges in the process, and challenges in data
management. The challenge relates to how to manipulate an impressive volume of data that can reach its destination intact.
GOWRU BHARATH KUMAR
17. Literature SurveyContā¦
Title 7: SICE: An improved missing data imputation technique
Remarks:
In data analytics, missing data is a factor that degrades performance. Incorrect imputation of missing values could lead
to a wrong prediction. In this era of big data, when a massive volume of data is generated in every second, and utilization of these
data is a major concern to the stakeholders, efficiently handling missing values becomes more important.
Title 8: A big data analytics framework for detecting user-level depression from social networks
Remarks:
ā¢ Depression is one of the most common mental health problems worldwide. The diagnosis of depression is usually done by
clinicians based on mental status questionnaires and patient's self-reporting. Not only do these methods highly depend on the
current mood of the patient, but also people who experience mental illness are often reluctantly seeking help.
ā¢ Social networks have become a popular platform for people to express their feelings and thoughts with friends and family. With
the substantial amount of data in social networks, there is an opportunity to try designing novel frameworks to identify those at
risk of depression.
GOWRU BHARATH KUMAR
18. Literature Survey Contā¦
Title 9: Effective and efficient usage of big data analytics in public sector
Remarks: Technological advancements and data security are among the most important factors that may impact the effectiveness
and efficiency of big data usage. Authentication, governmentsā focus on it and transparency and accountability are the most
important factors in techno-centric, governmental-centric and user-centric factors, respectively.
Title 10: Challenges and Uses of Big Data Analytics for Social Media
Remarks:
ā¢ Limitless quantity of shapeless information is being apprehend and examined above social media. The manuscript focus the
topic of requirement of average value organizes proceed towards that might be employed for every social networking websites.
ā¢ It is owing to assortment of design of big data suitable on top of these websites. Matter disclose an confront not merely in
confine of big information but also in examination along with give up of important information, which involve executive
GOWRU BHARATH KUMAR
19. Research Gaps
ā¢ Improving Agility using Big Data Analytics
To address the inconsistency, our study explores the conditions under which BDA use translates into agility.
ā¢ Detecting user-level depression from social networks
Social networks have become a popular platform for people to express their feelings and thoughts with friends and family.
With the substantial amount of data in social networks, there is an opportunity to try designing novel frameworks to identify
those at risk of depression.
ā¢ Improving Platforms oriented business and data analytics in digital ecosystem
We focussed the study on how to concentrate the strategic revolution in financial sector with various econometric tools.
ā¢ Performance Analysis of Distributed Computing Frameworks for Big Data Analytics
For providing better support to software developers for dealing with big data problems, new programming platforms are
continuously developing In this research work, a comparative analysis of Hadoop MapReduce and Spark has been presented
on the basis of working principle, performance, cost, ease of use, compatibility, data processing, failure tolerance, and
security.
GOWRU BHARATH KUMAR
20. Research Gaps Contā¦
ā¢ Big data analytics for smart factories of the future
Essential elements of and promising solutions enabled by data science that are critical to processing data of high volume, velocity,
variety, and low veracity, towards the creation of added-value in smart factories of the future.
ā¢ An improved missing data imputation technique
Missing data is a factor that degrades performance. Incorrect imputation of missing values could lead to a wrong prediction. In this
era of big data, when a massive volume of data is generated in every second, and utilization of these data is a major concern to the
stakeholders, efficiently handling missing values becomes more important.
ā¢ Effective and efficient usage of big data analytics in public sector
Technological advancements and data security are among the most important factors that may impact the effectiveness and
efficiency of big data usage. Authentication, governmentsā focus on it and transparency and accountability are the most important
factors in techno-centric, governmental-centric and user-centric factors, respectively.
ā¢ Challenges and Uses of Big Data Analytics for Social Media
It is owing to assortment of design of big data suitable on top of these websites. Matter disclose an confront not merely in confine
of big information but also in examination along with give up of important information, which involve executive.
GOWRU BHARATH KUMAR
21. ACTIVITY 2019-20 2020-21 2021-22 2022-23
Admission
1st Test for Pre-PhD
2nd Test for Pre-PhD
3rd Test for Pre-PhD
Literature Survey
Presentation of Papers in International Journal
/ Conferences on Research Work
GITAM UNIVERSITY SEMINAR I
Comparison of various Privacy methods
Testing and Validation
Presentation of Papers in International Journal
/ Conferences on Research Work
Development of new Privacy Model
Testing and Validation
Presentation of Papers in International Journal
/ Conferences on Research Work
Thesis preparation
PRE-SUBMISSION SEMINAR
Anti-Plagiarism Test
Thesis Submission
RESEARCH WORK ACTIVITY CHART
GOWRU BHARATH KUMAR
22. Bibliography
1. Youyung Hyun, Taro Kamioka, Ryuichi Hosoya Improving Agility Using Big Data Analytics: The Role of Democratization Culture
ISSN:1943-7536.
2. Shrutika Mishra and A. R. Triptahi, Platforms oriented business and data analytics in digital ecosystem, International Journal of Financial
Engineering Vol. 06, No. 04, 1950036 (2019)
3. Manoj Muniswamaiah, Tilak Agerwala, Charles C. Tappert, Automatic Visual Recommendation for Data Science and Analytics Future of
Information and Communication Conference FICC 2020: Advances in Information and Communication pp 125-132
4. Sachin K. Mangla, Rakesh Raut, Vaibhav S. Narwane, Zuopeng (Justin) Zhang, Pragati priyadarshinee, Mediating effect of big data
analytics on project performance of small and medium enterprises, Journal of Enterprise Information Management ISSN: 1741-0398
5. Shwet Ketu, Pramod Kumar Mishra, Sonali Agarwal, Performance Analysis of Distributed Computing Frameworks for Big Data Analytics:
Hadoop Vs Spark, An International Journal Computer Science and Applications ISSN 2007-9737
6. Robert X.Gao, LihuiWang, MoneerHelu, RobertoTeti, Big data analytics for smart factories of the future
https://www.sciencedirect.com/science/article/abs/pii/S0007850620301359
7. Shahidul Islam Khan & Abu Sayed Md Latiful Hoque, SICE: An improved missing data imputation technique
https://journalofbigdata.springeropen.com/articles/10.1186/s40537-020-00313-w
8. XingweiYang, RhondaMcEwen, Liza RobeeOng, MortezaZihayat A big data analytics framework for detecting user-level depression
from social networks https://www.sciencedirect.com/science/article/abs/pii/S0268401219313325
9. Mudassir Khan, Mohd Dilshad Ansari, Syed Yasmeen Shahdad, Challenges and Uses of Big Data Analytics for Social Media
https://link.springer.com/chapter/10.1007/978-981-15-1420-3_118
GOWRU BHARATH KUMAR