SlideShare uma empresa Scribd logo
1 de 8
Data Science
Data science is a field of applied mathematics and
statistics that provides useful information based on large
amounts of complex data or big data. It uses scientific
approaches, procedures, algorithms, the framework to
extract the knowledge and insight from a huge amount
of data. Data science is a concept to bring together
ideas, data examination, Machine Learning, and their
related strategies to comprehend and dissect genuine
phenomena with data.
KEY TAKEAWAYS
•Data science uses techniques such as machine learning
and artificial intelligence to extract meaningful
information and to predict future patterns and behaviors.
•Advances in technology, the internet, social media, and
the use of technology have all increased access to big data.
•The field of data science is growing as technology
advances and big data collection and analysis techniques
become more sophisticated.
Statistics:-
Math is probably one of the most important topics that are the core of almost all the advances in technology. The filed of data
science wouldn’t have existed without maths.
Machine Learning and Statistics are the two core skills required to become a data scientist. Statistics is like the heart of Data
Science that helps to analyze, transform and predict data. Statistics is usually a part of mathematics wherein tables of data are
operated upon to calculate metrics like mean, median, and standard deviation. These metrics are then used to characterize the
available data so that it can be used in decision-making processes. These metrics are then used to characterize the available
data so that it can be used in decision-making processes.
7 Basic Statistics Concepts For Data Science:-
1. Descriptive Statistics:-
It is used to describe the basic features of data that provide a summary of the given data set which can either represent the
entire population or a sample of the population. It is derived from calculations that include:
Mean: It is the central value which is commonly known as arithmetic average.
Mode: It refers to the value that appears most often in a data set.
Median: It is the middle value of the ordered set that divides it in exactly half.
2. Variability:-
• Variability includes the following parameters:
• Standard Deviation: It is a statistic that calculates the dispersion of a data set as compared to its mean.
• Variance: It refers to a statistical measure of the spread between the numbers in a data set. In general terms, it means the
difference from the mean. A large variance indicates that numbers are far apart from the mean or average value. Small
variance indicates that the numbers are closer to the average values. Zero variance indicates that the values are identical to
the given set.
• Range: This is defined as the difference between the largest and smallest value of a dataset.
• Percentile: It refers to the measure used in statistics that indicates the value below which the given percentage of
observation in the dataset falls.
• Quartile: It is defined as the value that divides the data points into quarters.
• Interquartile Range: It measures the middle half of your data. In general terms, it is the middle 50% of the dataset.
3. Correlation:-
• It is one of the major statistical techniques that measure the relationship between two variables. The correlation coefficient
indicates the strength of the linear relationship between two variables.
• A correlation coefficient that is more than zero indicates a positive relationship.
• A correlation coefficient that is less than zero indicates a negative relationship.
• Correlation coefficient zero indicates that there is no relationship between the two variables.
4. Probability Distribution:-
• It specifies the likelihood of all possible events. In simple terms, an event refers to the result of an experiment like tossing a
coin. Events are of two types dependent and independent.
• Independent event: The event is said to be an Independent event when it is not affected by the earlier events. For example,
tossing a coin, let us consider a coin is tossed the first outcome is head when the coin is tossed again the outcome may be
head or tail. But this is entirely independent of the first trial.
• Dependent event: The event is said to be dependent when the occurrence of the event is dependent on the earlier events. For
example when a ball is drawn from a bag that contains red and blue balls. If the first ball drawn is red, then the second ball
may be red or blue; this depends on the first trial.
The probability of independent events is calculated by simply multiplying the probability of each event and for a dependent
event is calculated by conditional probability.
5. Regression:-
It is a method that is used to determine the relationship between one or more independent variables and a dependent variable.
Regression is mainly of two types:
• Linear regression: It is used to fit the regression model that explains the relationship between a numeric predictor variable
and one or more predictor variables.
• Logistic regression: It is used to fit a regression model that explains the relationship between the binary response variable
and one or more predictor variables.
6. Normal Distribution:-
Normal is used to define the probability density function for a continuous random variable in a system. The standard normal
distribution has two parameters – mean and standard deviation that are discussed above. When the distribution of random
variables is unknown, the normal distribution is used. The central limit theorem justifies why normal distribution is used in
such cases.
7. Bias:-
• In statistical terms, it means when a model is representative of a complete population. This needs to be minimized to get
the desired outcome.
• The three most common types of bias are:
• Selection bias: It is a phenomenon of selecting a group of data for statistical analysis, the selection in such a way that data
is not randomized resulting in the data being unrepresentative of the whole population.
• Confirmation bias: It occurs when the person performing the statistical analysis has some predefined assumption.
• Time interval bias: It is caused intentionally by specifying a certain time range to favor a particular outcome.
Programming tools using Data Science
A data scientist shall extract, manipulate, pre-process and
generate information forecasts. To do this, it needs
different statistical instruments and languages of
programming. In this article, we will discuss some data
science tools that data scientists use to conduct data
transactions and that we will understand the main features
of the tools, their benefits, and the comparison of different
data science tools.
Top Data Science Tools:-
1. SAS
It is one of those information scientific instruments
designed purely for statistical purposes. SAS is
proprietary closed-source software for analyzing
information by big companies. It is commonly used in
commercial software by experts and businesses. As a data
scientist, SAS provides countless statistical libraries and
instruments to model and organize data. Although SAS is
highly trustable and has strong support, it is high in cost
and used only by larger industries. Moreover, several SAS
libraries and packages are not in the base package and can
be upgraded costly.
2. Apache Spark
Apache Spark, or simply political Spark, is a powerful analytics engine and the most commonly used Data Science
instrument. Spark is intended specifically for batch and stream processing. Spark can manage streaming information better
than other Big Data platforms. However, Spark’s most strong combination with Scala is a virtual Java-based programming
language, which is cross-platform in nature.
Features of Apache Spark:
• Apache Spark has great speed.
• It also has an advanced analytics.
• Apache spark also has a real-time stream processing.
• Dynamic in nature.
• It also has a fault tolerance.
3. BigML
BigML, another data science tool that is used very much. It offers an interactive, cloud-based GUI environment for machine
algorithm processing. BigML offers standardized cloud-based software for the sector. It allows businesses throughout multiple
areas of their enterprise to use Machine Learning algorithms. BigML is an advanced modelling specialist. It utilizes a large
range of algorithms for machine learning, including clustering and classification. You can create a free account or premium
account based on your information needs using the BigML web interface using Rest APIs. It enables interactive information
views and gives you the capacity to export visual diagrams on your mobile or IoT devices.
4. Excel
Excel is created mainly to calculate sheets by Microsoft and is currently commonly used for data processing, complicated and
visualization calculations. Excel is an efficient data science analytical instrument. Excel has several formulas, tables, filters,
slicers and so on. You can also generate your personalized features and formulae with Excel. While Excel is still an ideal
option for powerful data visualization and tablets, it is not intended to calculate huge quantities of data. You also can connect
SQL to Excel and use it for data management and analysis. Many Data Scientists use Excel as an interactive graphical device
for easy pre-processing of information. In general, Excel is an optimal instrument for data analytics at a tiny and non-
enterprise level.
Features of Excel:
• For the small scale data analysis, it is trendy.
• Excel is also used for the spreadsheet calculation and visualization.
• Excel tool pack used for data analysis complex.
• It provides the easy Connection with the SQL.
5. D3.js
6. MatLab
7. NLTK
8. TensorFlow
9. Weka
10. Jupyter
11. Tableau
12. Scikit-learn

Mais conteúdo relacionado

Semelhante a Data Science 1.pdf

Data science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptxData science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptxswapnaraghav
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive StatisticsCIToolkit
 
Research EDU821-1.pptx
Research EDU821-1.pptxResearch EDU821-1.pptx
Research EDU821-1.pptxSalmaNiazi2
 
what is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysiswhat is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysisData analysis ireland
 
Unit 8 data analysis and interpretation
Unit 8 data analysis and interpretationUnit 8 data analysis and interpretation
Unit 8 data analysis and interpretationAsima shahzadi
 
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science  & AI Road Map by Python & Computer science tutor in MalaysiaData Science  & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in MalaysiaAhmed Elmalla
 
Introduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdfIntroduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdfGraceOkeke3
 
Introduction-to-Data-Analysis_Final Content.pptx
Introduction-to-Data-Analysis_Final Content.pptxIntroduction-to-Data-Analysis_Final Content.pptx
Introduction-to-Data-Analysis_Final Content.pptxItismeItisnotme
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data AnalysisKaty Allen
 
IDS-Unit-II. bachelor of computer applicatio notes
IDS-Unit-II. bachelor of computer applicatio notesIDS-Unit-II. bachelor of computer applicatio notes
IDS-Unit-II. bachelor of computer applicatio notesAnkurTiwari813070
 
Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxUnit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxJANNU VINAY
 
Statistics Assignments 090427
Statistics Assignments 090427Statistics Assignments 090427
Statistics Assignments 090427amykua
 
presentaion-ni-owel.pptx
presentaion-ni-owel.pptxpresentaion-ni-owel.pptx
presentaion-ni-owel.pptxJareezRobios
 

Semelhante a Data Science 1.pdf (20)

STATISTICS.pptx
STATISTICS.pptxSTATISTICS.pptx
STATISTICS.pptx
 
Data science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptxData science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptx
 
Datascience
DatascienceDatascience
Datascience
 
datascience.docx
datascience.docxdatascience.docx
datascience.docx
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Research EDU821-1.pptx
Research EDU821-1.pptxResearch EDU821-1.pptx
Research EDU821-1.pptx
 
what is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysiswhat is ..how to process types and methods involved in data analysis
what is ..how to process types and methods involved in data analysis
 
1.pdf
1.pdf1.pdf
1.pdf
 
Unit 8 data analysis and interpretation
Unit 8 data analysis and interpretationUnit 8 data analysis and interpretation
Unit 8 data analysis and interpretation
 
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science  & AI Road Map by Python & Computer science tutor in MalaysiaData Science  & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
 
Introduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdfIntroduction to Data Analysis Course Notes.pdf
Introduction to Data Analysis Course Notes.pdf
 
Introduction-to-Data-Analysis_Final Content.pptx
Introduction-to-Data-Analysis_Final Content.pptxIntroduction-to-Data-Analysis_Final Content.pptx
Introduction-to-Data-Analysis_Final Content.pptx
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
IDS-Unit-II. bachelor of computer applicatio notes
IDS-Unit-II. bachelor of computer applicatio notesIDS-Unit-II. bachelor of computer applicatio notes
IDS-Unit-II. bachelor of computer applicatio notes
 
Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxUnit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptx
 
Statistics Assignments 090427
Statistics Assignments 090427Statistics Assignments 090427
Statistics Assignments 090427
 
Week_2_Lecture.pdf
Week_2_Lecture.pdfWeek_2_Lecture.pdf
Week_2_Lecture.pdf
 
UNIT 4.pptx
UNIT 4.pptxUNIT 4.pptx
UNIT 4.pptx
 
Data analytics
Data analyticsData analytics
Data analytics
 
presentaion-ni-owel.pptx
presentaion-ni-owel.pptxpresentaion-ni-owel.pptx
presentaion-ni-owel.pptx
 

Último

No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoKayode Fayemi
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Baileyhlharris
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardsticksaastr
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lodhisaajjda
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxmohammadalnahdi22
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Chameera Dedduwage
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfSkillCertProExams
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Vipesco
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatmentnswingard
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaKayode Fayemi
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxraffaeleoman
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIINhPhngng3
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesPooja Nehwal
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfSenaatti-kiinteistöt
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubssamaasim06
 

Último (20)

No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 

Data Science 1.pdf

  • 1. Data Science Data science is a field of applied mathematics and statistics that provides useful information based on large amounts of complex data or big data. It uses scientific approaches, procedures, algorithms, the framework to extract the knowledge and insight from a huge amount of data. Data science is a concept to bring together ideas, data examination, Machine Learning, and their related strategies to comprehend and dissect genuine phenomena with data. KEY TAKEAWAYS •Data science uses techniques such as machine learning and artificial intelligence to extract meaningful information and to predict future patterns and behaviors. •Advances in technology, the internet, social media, and the use of technology have all increased access to big data. •The field of data science is growing as technology advances and big data collection and analysis techniques become more sophisticated.
  • 2. Statistics:- Math is probably one of the most important topics that are the core of almost all the advances in technology. The filed of data science wouldn’t have existed without maths. Machine Learning and Statistics are the two core skills required to become a data scientist. Statistics is like the heart of Data Science that helps to analyze, transform and predict data. Statistics is usually a part of mathematics wherein tables of data are operated upon to calculate metrics like mean, median, and standard deviation. These metrics are then used to characterize the available data so that it can be used in decision-making processes. These metrics are then used to characterize the available data so that it can be used in decision-making processes. 7 Basic Statistics Concepts For Data Science:- 1. Descriptive Statistics:- It is used to describe the basic features of data that provide a summary of the given data set which can either represent the entire population or a sample of the population. It is derived from calculations that include: Mean: It is the central value which is commonly known as arithmetic average. Mode: It refers to the value that appears most often in a data set. Median: It is the middle value of the ordered set that divides it in exactly half.
  • 3. 2. Variability:- • Variability includes the following parameters: • Standard Deviation: It is a statistic that calculates the dispersion of a data set as compared to its mean. • Variance: It refers to a statistical measure of the spread between the numbers in a data set. In general terms, it means the difference from the mean. A large variance indicates that numbers are far apart from the mean or average value. Small variance indicates that the numbers are closer to the average values. Zero variance indicates that the values are identical to the given set. • Range: This is defined as the difference between the largest and smallest value of a dataset. • Percentile: It refers to the measure used in statistics that indicates the value below which the given percentage of observation in the dataset falls. • Quartile: It is defined as the value that divides the data points into quarters. • Interquartile Range: It measures the middle half of your data. In general terms, it is the middle 50% of the dataset.
  • 4. 3. Correlation:- • It is one of the major statistical techniques that measure the relationship between two variables. The correlation coefficient indicates the strength of the linear relationship between two variables. • A correlation coefficient that is more than zero indicates a positive relationship. • A correlation coefficient that is less than zero indicates a negative relationship. • Correlation coefficient zero indicates that there is no relationship between the two variables. 4. Probability Distribution:- • It specifies the likelihood of all possible events. In simple terms, an event refers to the result of an experiment like tossing a coin. Events are of two types dependent and independent. • Independent event: The event is said to be an Independent event when it is not affected by the earlier events. For example, tossing a coin, let us consider a coin is tossed the first outcome is head when the coin is tossed again the outcome may be head or tail. But this is entirely independent of the first trial. • Dependent event: The event is said to be dependent when the occurrence of the event is dependent on the earlier events. For example when a ball is drawn from a bag that contains red and blue balls. If the first ball drawn is red, then the second ball may be red or blue; this depends on the first trial. The probability of independent events is calculated by simply multiplying the probability of each event and for a dependent event is calculated by conditional probability.
  • 5. 5. Regression:- It is a method that is used to determine the relationship between one or more independent variables and a dependent variable. Regression is mainly of two types: • Linear regression: It is used to fit the regression model that explains the relationship between a numeric predictor variable and one or more predictor variables. • Logistic regression: It is used to fit a regression model that explains the relationship between the binary response variable and one or more predictor variables. 6. Normal Distribution:- Normal is used to define the probability density function for a continuous random variable in a system. The standard normal distribution has two parameters – mean and standard deviation that are discussed above. When the distribution of random variables is unknown, the normal distribution is used. The central limit theorem justifies why normal distribution is used in such cases. 7. Bias:- • In statistical terms, it means when a model is representative of a complete population. This needs to be minimized to get the desired outcome. • The three most common types of bias are: • Selection bias: It is a phenomenon of selecting a group of data for statistical analysis, the selection in such a way that data is not randomized resulting in the data being unrepresentative of the whole population. • Confirmation bias: It occurs when the person performing the statistical analysis has some predefined assumption. • Time interval bias: It is caused intentionally by specifying a certain time range to favor a particular outcome.
  • 6. Programming tools using Data Science A data scientist shall extract, manipulate, pre-process and generate information forecasts. To do this, it needs different statistical instruments and languages of programming. In this article, we will discuss some data science tools that data scientists use to conduct data transactions and that we will understand the main features of the tools, their benefits, and the comparison of different data science tools. Top Data Science Tools:- 1. SAS It is one of those information scientific instruments designed purely for statistical purposes. SAS is proprietary closed-source software for analyzing information by big companies. It is commonly used in commercial software by experts and businesses. As a data scientist, SAS provides countless statistical libraries and instruments to model and organize data. Although SAS is highly trustable and has strong support, it is high in cost and used only by larger industries. Moreover, several SAS libraries and packages are not in the base package and can be upgraded costly.
  • 7. 2. Apache Spark Apache Spark, or simply political Spark, is a powerful analytics engine and the most commonly used Data Science instrument. Spark is intended specifically for batch and stream processing. Spark can manage streaming information better than other Big Data platforms. However, Spark’s most strong combination with Scala is a virtual Java-based programming language, which is cross-platform in nature. Features of Apache Spark: • Apache Spark has great speed. • It also has an advanced analytics. • Apache spark also has a real-time stream processing. • Dynamic in nature. • It also has a fault tolerance. 3. BigML BigML, another data science tool that is used very much. It offers an interactive, cloud-based GUI environment for machine algorithm processing. BigML offers standardized cloud-based software for the sector. It allows businesses throughout multiple areas of their enterprise to use Machine Learning algorithms. BigML is an advanced modelling specialist. It utilizes a large range of algorithms for machine learning, including clustering and classification. You can create a free account or premium account based on your information needs using the BigML web interface using Rest APIs. It enables interactive information views and gives you the capacity to export visual diagrams on your mobile or IoT devices.
  • 8. 4. Excel Excel is created mainly to calculate sheets by Microsoft and is currently commonly used for data processing, complicated and visualization calculations. Excel is an efficient data science analytical instrument. Excel has several formulas, tables, filters, slicers and so on. You can also generate your personalized features and formulae with Excel. While Excel is still an ideal option for powerful data visualization and tablets, it is not intended to calculate huge quantities of data. You also can connect SQL to Excel and use it for data management and analysis. Many Data Scientists use Excel as an interactive graphical device for easy pre-processing of information. In general, Excel is an optimal instrument for data analytics at a tiny and non- enterprise level. Features of Excel: • For the small scale data analysis, it is trendy. • Excel is also used for the spreadsheet calculation and visualization. • Excel tool pack used for data analysis complex. • It provides the easy Connection with the SQL. 5. D3.js 6. MatLab 7. NLTK 8. TensorFlow 9. Weka 10. Jupyter 11. Tableau 12. Scikit-learn