SlideShare a Scribd company logo
1 of 24
BIOSTATISTICS AND RESEARCH METHODOLOGY
Unit-4: statistical analysis
PRESENTED BY
Himanshu Rasyara
B. Pharmacy IV Year
UNDER THE GUIDANCE OF
Gangu Sreelatha M.Pharm., (Ph.D)
Assistant Professor
CMR College of Pharmacy, Hyderabad.
email: sreelatha1801@gmail.com
• What is Microsoft Excel?
• Microsoft Excel is a spreadsheet program used to record and analyse numerical and statistical data.
Microsoft Excel provides multiple features to perform various operations like calculations, pivot tables,
graph tools, macro programming, etc.
• An Excel spreadsheet can be understood as a collection of columns and rows that form a table.
Alphabetical letters are usually assigned to columns, and numbers are usually assigned to rows. The point
where a column and a row meet is called a cell.
MS-EXCEL is a part of Microsoft Office suite software. It is an electronic spreadsheet with numerous
rows and columns, used for organizing data, graphically represent data(s), and performing different
calculations. It consists of 1048576 rows and 16383 columns, a row and column together make a cell.
• Understanding the Ribbon
• The ribbon provides shortcuts to commands in Excel. A command is an action that the user performs. An
example of a command is creating a new document, printing a documenting, etc.
You can perform statistical analysis with the help of Excel. It is used by most of the data scientists who
require the understanding of statistical concepts and behavior of the data. But when the data set is huge or
you need some specialized data analysis model such as linear or regression, you should go for advanced
tools such as Python, R programming. Here, we will go through the basic concept of statistical analysis and
will apply the concepts to our own data.
Before starting, you need to check whether Excel Analysis Tool Pak is enabled in Excel or not (it is an add-
in provided by Microsoft Excel). To check whether it is enabled or not, go to Excel → Data and check
whether data analysis option is there or not on the top right corner.
• If it is not there, go to Excel → File → Options → Add-in and enable the Analysis Tool Pak by selecting
the Excel Add-ins option in manage tab and then, click GO. This will open a small window; select the
Analysis Tool Pak option and enable it.
• Descriptive Analysis
• You can find descriptive analysis by going to Excel→ Data→ Data Analysis → Descriptive statistics. It is the
most basic set of analysis that can be performed on any data set. It gives you the general behaviour and pattern
of the data. It is helpful when you a have a set of data and want to have the summary of that dataset. This will
show the following statistic data for the chosen dataset.
• Mean, Standard error and Median
• Median, Mode and Standard Deviation
• Sample Variance
• Kurtosis and Skewness
• Range, Minimum, Maximum, Sum and Count
• ANOVA (Analysis Of Variance)
• It is a data analysis method which shows whether the mean of two or more data set is significantly different
from each other or not. In other words, it analyses two or more groups simultaneously and finds out whether
any relationship is there among the groups of data set or not. For example, you can use ANOVA if you want to
analyze the traffic of three different cities and find out which one is more efficient in handling the traffic (or if
there are no significant differences among the traffic).
You will find three types of ANOVA in the Excel
1.ANOVA single factor
2.ANOVA two factor with replication
3.ANOVA two factor without replication
• Moving Average
• Moving average is usually applicable for time series data such as stock price, weather report, attendance
in class etc. For example, it is heavily used in stock price as a technical indicator. If you want to predict the
stock price of today, the last ten days data would be more relevant than the last 1 year. So, you can plot the
moving average of the stock having a 10-day time period and you can then predict the price to some
extent. The same applies to the temperature of a city. The recent temperature of a city can be calculated by
taking the average of last few weeks rather than previous months.
• Regression
• Regression is a process of establishing a relationship among many variables. Usually, we establish a
relationship between dependent variables and independent variables. For example, cases when you want to
see if there is any increase in the revenue of product, which is not due to increase in the advertisement.
• Sampling
• This option is the data analysis tool which is used for creating samples from a huge population. You can
randomly select data from the dataset or select every nth item from the set. For example, if you want to
measure the effectiveness of a call center employee in a call center, you can use this tool to randomly select
few data every month and listen to their recorded calls and give a rating based on the selected call.
SPSS
• SPSS (Statistical Package for the Social Sciences) is a versatile and responsive program designed to
undertake a range of statistical procedures. SPSS software is widely used in a range of disciplines and is
available from all computer pools within the University of South Australia.
• SPSS is a Windows based program that can be used to perform data entry and analysis and to create
tables and graphs. SPSS is capable of handling large amounts of data and can perform all of the analyses
covered in the text and much more. SPSS is commonly used in the Social Sciences and in the business
world.
Task: Open SPSS
Click on the Start menu ( ) > All Programs > IBM SPSS Statistics > IBM SPSS Statistics 21
(or whatever is the latest version number) to pen the SPSS program.
• Layout of SPSS The Data Editor window has two views that can be selected from the lower left hand
side of the screen. Data View is where you see the data you are using. Variable View is where you can
specify the format of your data when you are creating a file or where you can check the format of a pre-
existing file. The data in the Data Editor is saved in a file with the extension.sav.
• Data view : It is the spreadsheet that is visible when you first open the Data Editor; this sheet contains
the data. Unlike MS Excel, formulas and variable names cannot be
entered here.
Variable view : It contains information about the variables in the data set.
Syntax
Another important window in the SPSS environment is the Syntax Editor. In earlier versions of SPSS, all
of the procedures performed by SPSS were submitted through the use of syntax, which instructed SPSS on
how to process your data. Using SPSS syntax allows you access to additional commands which are not
available through the menus and dialog boxes, and syntax files can be stored and rerun at a later date,
allowing you to repeat an analysis.
From the menu in the Data Editor window
File >> New >> Syntax
Output Viewer
When you execute a command for a statistical analysis, regardless of whether you used syntax or dialog boxes, the output will be
printed in the Output Viewer.
From the menu in the Data Editor window
File >> New >> Output
DESIGN OF EXPERIMENTS
• DOE is an essential tool to ensure products and processes satisfy Quality by Design requirements
imposed by regulatory agencies. Using a QbD approach to develop your testing process can help you
reduce waste, meet compliance criteria and get to market faster.
• DOE helps you create a reliable QbD process for assessing formula robustness, determining critical
quality attributes and predicting shelf life by using a few months of historical data.
Why Use a Quality By Design Approach?
Using a Quality by Design (QbD) approach to develop the testing process and to choose the critical quality
attributes for a pharmaceutical product can help to:
• Ensure products meet defined critical quality attributes
• Meet regulatory compliance criteria
• Predict formula robustness
• Reduce waste in production
• Get to market faster
Using DOE to Optimize Processes
• When it comes to creating an optimal manufacturing process that limits variation and conserves energy
or resources, or a developing a new formula that is most likely to meet customer expectations, design of
experiments (DOE) is an indispensable tool.
DOE helps you to:
• Minimize the number of experiments you have to do to find the ideal formula or recipe
• Create a robust process (one that holds up to changes in environment, humidity, ingredient variation,
etc.)
• Adapt a recipe for changes in ingredients or packaging needs (availability, eco-compliance, regulations,
consumer trends, etc.)
Using DOE to Predict Formula Robustness
• Being able to demonstrate product robustness and deliver the intended quality of the product within
allowable ranges for the claimed shelf-life period is critical for pharmaceutical manufacturers. Both
international and country specific regulatory agencies, such as the FDA, pay close attention to shelf-life
claims.
• Predicting formulation robustness requires a careful design of experiments that holds up under statistical
analysis. Using DOE for formulation robustness studies can help you select a commercial formulation
that is sufficiently robust within the acceptable ranges around the label claim to meet the shelf life
stability requirements.
Steps to Predict Formula Robustness
Step 1: Choose the Right Measurement Factors
• Ensure that the factors selected to study can be used to predict an acceptable formulation parameter
range where all the values for the assessed quality attributes will be inside the specified limits.
Step 2: Design a Statistically Valid Study
• Consider how the factors being investigated fit into a full factorial design. For pharma companies, for
example, robustness studies must be able to prove that specific critical quality attributes stay within the
acceptable ranges for the entire shelf-life period. In addition:
• The study must result in a regression model that is statistically significant
• The study must provide output parameters (quality attributes) that are within predefined limits
Step 3. Analyze the Data Using Multiple Linear Regression
• One important way to produce a valid testing model is to use a tool that makes Design of Experiments
easier. For example, MODDE® Design of Experiments Software, can help you set up multivariate
formulation robustness studies that demonstrate the acceptable ranges of quality for a target composition,
define the allowable edges of the composition range, and predict the stability requirements needed to
reach the end of shelf life.
DOE EduPack is designed to give students hands-on skills to solve problems and learn:
• How to create efficient experimental designs to match the objectives
• How to analyze data based on sound statistical principals to evaluate results of the experiments
• How to interpret results by using graphical and statistical tools
• How to convert modeling results into concrete action with MODDE® optimizer & verifying experiments
• How to define a design space and find robust setpoints
• APPLICATIONS OF DESIGN OF EXPERIMENTS IN QbD AND AQbD
Quality by Design approach was accepted by FDA in 2004 and described in ‘pharmaceutical cGMPs for
21st century – a risk-based approach’.
• International conference on harmonization (ICH) Q8 pharmaceutical development, Q9 quality risk
assessment, and Q10 pharmaceutical quality system provide detailed requirements regarding
pharmaceutical product quality.
• QbD and DoE approaches help to implement ICH/Q8 and ICH/Q9.
• Since QbD approach was accepted by FDA, DoE has been widely employed in order to provide a
complete understanding of the product and its manufacturing process. Many applications of DoE used
for screening and optimization purposes of pharmaceutical products and their manufacturing processes
may be found in the literature. Several input factors (independent variables), such as excipient
concentrations, stirring time, stirring speed, temperature, pressure, among other may be screened and
optimized using DoE. Studied output responses (dependent variables) included particle size, entrapment
efficiency, dissolution rate, among other.
• Application of screening designs in pharmaceutical QbD allow to identify the critical material attributes
(CMAs) and critical process parameters (CPPs) (independent variables) affecting the critical quality
attributes (CQAs) (dependent variables) and, therefore, the quality target product profile (QTPP). In
addition, optimizing design and surface response methodology and multiple response optimization allow
to define a design space region in which CQAs and QTPP are attended. The adoption of a design space
region based on product and process understanding allow regulatory flexibility, because changes within
the design space region do not require prior regulatory approval.
• Recently, DoE has been used in the rational development and optimization of analytical methods.
Culture media composition, mobile phase composition, flow rate, time of incubation are examples of
input factors (independent variables) that may the screened and optimized using DoE. Several output
responses (dependent variables), such as retention time, resolution between peaks, microbial growth,
among other responses were found in literature.
MINITAB
• Minitab is a statistics package developed at the Pennsylvania State University by researchers Barbara F.
Ryan, Thomas A. Ryan, Jr., and Brian L. Joiner in conjunction with Triola Statistics Company in 1972.
• It began as a light version of OMNITAB 80, a statistical analysis program by NIST, which was
conceived by Joseph Hilsenrath in years 1962-1964 as OMNITAB program for IBM 7090. The
documentation for OMNITAB 80 was last published 1986, and there has been no significant
development since then.
• In 2020, during the COVID-19 pandemic, Minitab LLC requested and received between $5 million and
$10 million under the Pay check Protection Program to avoid having to let go 250 employees. As of
2021, Minitab LLC had subsidiaries in the UK, France, Germany, Hong Kong, and Australia.
• A statistics package developed to help six sigma professionals analyse and interpret data to help in the
business process is called Minitab. The data input is simplified so that it can be easily used for statistical
analysis and it also helps in manipulating the dataset.
• Key Features of Minitab
1. Basic Statistics: This feature covers all kind of statistical tests, descriptive statistics, correlations, and
covariances.
2. Graphics: This enables users to draw various statistical graphs such as scatter plot, histograms,
boxplots, matrix plot, marginal plot, bubble charts etc.
3. Regression: This feature enables users to find the relationship between variables (which is a key feature
of any statistical tool). Regression is available in form of linear, non-linear, ordinal, nominal etc.
4. Analysis of Variance: Analysis of variance i.e., ANOVA is used to analyse the difference between the group
means.
5. Statistical Process Control: This feature helps you create cause and effect diagrams, variable control charts, multi-
variate control charts, time-weighted charts, etc.
6. Measurement System Analysis: MSA is a mathematical method to determine the amount of variation that exists
within a measurement process. Variability in a process can directly impact the overall variance of a process.
7. Design of Experimentations: This feature helps you identify the cause-and-effect relationship. This helps in
creating and experimenting with various designs by noting down all its relevant outputs. This helps you on finalizing
a certain method and optimizing it.
8. Reliability/Survival: It enables you to select the best distribution for modelling data. It helps you in identifying
which is the best function that best describes your data.
• One of the most common methods used in statistical analysis is hypothesis testing. Minitab offers many
hypothesis tests, including t-tests and ANOVA (analysis of variance). Usually, when you perform a
hypothesis test, you assume an initial claim to be true, and then test this claim using sample data.
• Hypothesis tests include two hypotheses (claims), the null hypothesis (H0) and the alternative hypothesis
(H1). The null hypothesis is the initial claim and is often specified based on previous research or
common knowledge. The alternative hypothesis is what you believe might be true.
Perform an ANOVA
1. Choose Stat> ANOVA>One- Way.
2. Choose Response data are in one column for all factor levels.
3. In response, enter Days. In factor, enter Center.
4. Click Comparisons.
5. Under Comparison procedures assuming equal variances,
check Tukey.
Click OK.
7. Click Graphs. For many statistical commands, Minitab includes graphs that help you interpret the
results and assess the validity of statistical assumptions. These graphs are called built-in graphs.
8. Under Data plots, check Interval plot, Individual value plot, and Boxplot of data.
9. Under Residual plots, choose Four in one.
10. Click OK in each dialog box.
R-ONLINE
R is a language and environment for statistical computing and graphics."
• "R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-
series analysis, classification, clustering) and graphical techniques, and is highly extensible."
• "One of R's strengths is the ease with which well-designed publication-quality plots can be produced,
including mathematical symbols and formulae where needed.“
Importance of R Programming Language
• R is a well-developed, simple, and effective programming language. Which includes conditional loops;
user defined recursive functions and input and output facilities.
• R provides graphical facilities for data analysis and display.
• R is a very flexible language. It does not necessitate that everything should be done in R itself. It allows
the use of other tools, like C and C++ if required.
• R has an effective data handling and storage facility.
• R provides an extensive, coherent, and integrated collection of tools for data analysis.
 R also includes a package system that allows the users to add their individual functionality in a manner
that is indistinguishable from the core of R.
• R is actively used for statistical computing and design. It has brought about revolutionary improvements
in big data and data analytics. It is the most widely used language in the world of data science! Some of
the big shots in the industry like Google, LinkedIn, and Facebook, rely on R for many of their
operations.
Programming Features of R
• R has various programming features which we will discuss below:
1. Data Inputs and Data Management
 Data inputs such as data type, importing data, keyboard typing.
 Data management such as data variables, operators.
2. Distributed Computing and R Packages –
• Distributed computing is an open-source, high-performance platform for the R language. It splits tasks
between multiple processing nodes to reduce execution time and analyse large datasets.
 R Packages – R packages are a collection of R functions, compiled code and sample data. By default, R
installs a set of packages during installation.
 Advantages and Disadvantages of R Programming
• There are several benefits and some limitations of the R programming language. Let us discuss them one
by one:
• Pros of R Language
• R is the most comprehensive statistical analysis package, as new technology and ideas often appear first
in R.
• It is cross-platform which runs on many operating systems. It’s best for GNU/Linux and Microsoft
Windows.
• In R, everyone is welcomed to provide bug fixes, code enhancements, and new packages.
• Cons of R Language
 The quality of some packages in R is less than perfect.
 There’s no customer support of R Language whom you can complain if something doesn’t work.
 R commands hardly concerns over memory management, and so R can consume all the available
memory.
• USE OF R-PROGRAMMING FOR CLINICAL TRAIL DATAANALYSIS
• The use of R programming in clinical trials has not been the most popular and obvious, despite its recent
growth over the past few years, its practical use still seems to be hindered by several factors, sometimes
due to misunderstandings, (e.g. validation) but also because of a lack of knowledge of its capabilities.
Despite these bottlenecks, though, R is doubtlessly creating its own (larger by the day) niche in the
pharmaceutical industry.
• In this blog we will see how R can be used to create TLFs much like the current combination of PROC
REPORT/PROC TABULATE and the ODS currently does, thus showing its power and capability to play
an important role in our industry in the years to come, not as a replacement for, but rather as an
alternative option to SAS®.
USES OF R-PROGRAMMING
Although R is a popular language used by many programmers, it is especially effective when used for
 Data analysis
 Statistical inference
 Machine learning algorithms
R offers a wide variety of statistics-related libraries and provides a favourable environment for statistical
computing and design. In addition, the R programming language gets used by many quantitative analysts
as a programming tool since it's useful for data importing and cleaning.
As of August 2021, R is one of the top five programming languages of the year, so it’s a favourite among
data analysts and research programmers. It’s also used as a fundamental tool for finance, which relies
heavily on statistical data.
The Popularity of R by Industry
Thanks to its versatility, many different industries use the R programming language. Here is a list of
disciplines that use the R programming language:
 Fintech Companies (financial services)
 Academic Research
 Government (FDA, National Weather Service)
 Retail
 Social Media
 Data Journalism
 Manufacturing
 Healthcare

More Related Content

What's hot

What's hot (20)

Wilcoxon Rank-Sum Test
Wilcoxon Rank-Sum TestWilcoxon Rank-Sum Test
Wilcoxon Rank-Sum Test
 
Factorial design ,full factorial design, fractional factorial design
Factorial design ,full factorial design, fractional factorial designFactorial design ,full factorial design, fractional factorial design
Factorial design ,full factorial design, fractional factorial design
 
UNIT IV.pptx Principle of cosmetic evaluation.
UNIT IV.pptx  Principle of cosmetic evaluation.UNIT IV.pptx  Principle of cosmetic evaluation.
UNIT IV.pptx Principle of cosmetic evaluation.
 
Factorial design \Optimization Techniques
Factorial design \Optimization TechniquesFactorial design \Optimization Techniques
Factorial design \Optimization Techniques
 
General research methodology mpharm
General research methodology  mpharmGeneral research methodology  mpharm
General research methodology mpharm
 
Cross over design, Placebo and blinding techniques
Cross over design, Placebo and blinding techniques Cross over design, Placebo and blinding techniques
Cross over design, Placebo and blinding techniques
 
5. Unit-V- Regulatory Concepts.
5. Unit-V- Regulatory Concepts.5. Unit-V- Regulatory Concepts.
5. Unit-V- Regulatory Concepts.
 
Introduction to Research.pdf
Introduction to Research.pdfIntroduction to Research.pdf
Introduction to Research.pdf
 
Design of Experiments (DOE)
Design of Experiments (DOE)Design of Experiments (DOE)
Design of Experiments (DOE)
 
Professional sales representative PSR
Professional sales representative PSR Professional sales representative PSR
Professional sales representative PSR
 
Biostat 8th semester B.Pharm-Introduction Ravinandan A P.pdf
Biostat 8th semester B.Pharm-Introduction Ravinandan A P.pdfBiostat 8th semester B.Pharm-Introduction Ravinandan A P.pdf
Biostat 8th semester B.Pharm-Introduction Ravinandan A P.pdf
 
Safety Data Generation
Safety Data GenerationSafety Data Generation
Safety Data Generation
 
Establishment of Pharmacovigilance Programme
Establishment of Pharmacovigilance ProgrammeEstablishment of Pharmacovigilance Programme
Establishment of Pharmacovigilance Programme
 
Social and preventive Pharmacy UNIT 4.pptx
Social and preventive Pharmacy  UNIT 4.pptxSocial and preventive Pharmacy  UNIT 4.pptx
Social and preventive Pharmacy UNIT 4.pptx
 
Minitab- A statistical tool
Minitab- A statistical tool Minitab- A statistical tool
Minitab- A statistical tool
 
Designing the methodology - B.Pharm
Designing the methodology - B.PharmDesigning the methodology - B.Pharm
Designing the methodology - B.Pharm
 
Wilcoxon Rank-Sum Test
Wilcoxon Rank-Sum TestWilcoxon Rank-Sum Test
Wilcoxon Rank-Sum Test
 
All non parametric test
All non parametric testAll non parametric test
All non parametric test
 
Graphs (Biostatistics)
Graphs (Biostatistics)Graphs (Biostatistics)
Graphs (Biostatistics)
 
Applications of sas and minitab in data analysis
Applications of sas and minitab in data analysisApplications of sas and minitab in data analysis
Applications of sas and minitab in data analysis
 

Similar to UNIT 4.pptx

Data Analysis Toolkit_Final v1.0
Data Analysis Toolkit_Final v1.0Data Analysis Toolkit_Final v1.0
Data Analysis Toolkit_Final v1.0
lee_anderson40
 

Similar to UNIT 4.pptx (20)

SOFTWARE USED IN P'epidemiology.pdf
SOFTWARE USED IN P'epidemiology.pdfSOFTWARE USED IN P'epidemiology.pdf
SOFTWARE USED IN P'epidemiology.pdf
 
SEM 8 BIOSTATISTICS graphs minitab excel etc
SEM 8 BIOSTATISTICS graphs minitab excel etcSEM 8 BIOSTATISTICS graphs minitab excel etc
SEM 8 BIOSTATISTICS graphs minitab excel etc
 
Business analyst
Business analystBusiness analyst
Business analyst
 
Factors affecting Design of Experiment (DOE) and softwares of DOE
Factors affecting Design of Experiment (DOE) and softwares of DOEFactors affecting Design of Experiment (DOE) and softwares of DOE
Factors affecting Design of Experiment (DOE) and softwares of DOE
 
Spss by vijay ambast
Spss by vijay ambastSpss by vijay ambast
Spss by vijay ambast
 
Analytics
AnalyticsAnalytics
Analytics
 
4 Statistical Software.pptx
4 Statistical Software.pptx4 Statistical Software.pptx
4 Statistical Software.pptx
 
A Guide to SPSS Statistics
A Guide to SPSS Statistics A Guide to SPSS Statistics
A Guide to SPSS Statistics
 
Hsc project management 2015
Hsc project management 2015Hsc project management 2015
Hsc project management 2015
 
Data Science and Analysis.pptx
Data Science and Analysis.pptxData Science and Analysis.pptx
Data Science and Analysis.pptx
 
Data Processing DOH Workshop.pptx
Data Processing DOH Workshop.pptxData Processing DOH Workshop.pptx
Data Processing DOH Workshop.pptx
 
SPSS.pptx(unit 4).pptx
SPSS.pptx(unit 4).pptxSPSS.pptx(unit 4).pptx
SPSS.pptx(unit 4).pptx
 
Data processing & Analysis: SPSS an overview
Data processing & Analysis: SPSS an overviewData processing & Analysis: SPSS an overview
Data processing & Analysis: SPSS an overview
 
Uses of SPSS and Excel to analyze data
Uses of SPSS and Excel   to analyze dataUses of SPSS and Excel   to analyze data
Uses of SPSS and Excel to analyze data
 
Data Science 1.pdf
Data Science 1.pdfData Science 1.pdf
Data Science 1.pdf
 
Data Analysis Toolkit_Final v1.0
Data Analysis Toolkit_Final v1.0Data Analysis Toolkit_Final v1.0
Data Analysis Toolkit_Final v1.0
 
Lobsters, Wine and Market Research
Lobsters, Wine and Market ResearchLobsters, Wine and Market Research
Lobsters, Wine and Market Research
 
Hsc project management
Hsc project managementHsc project management
Hsc project management
 
Research and Statistics Report- Estonio, Ryan.pptx
Research  and Statistics Report- Estonio, Ryan.pptxResearch  and Statistics Report- Estonio, Ryan.pptx
Research and Statistics Report- Estonio, Ryan.pptx
 
Data Processing & Explain each term in details.pptx
Data Processing & Explain each term in details.pptxData Processing & Explain each term in details.pptx
Data Processing & Explain each term in details.pptx
 

More from SreeLatha98

research ppt.pptx
research ppt.pptxresearch ppt.pptx
research ppt.pptx
SreeLatha98
 
non parametric tests.pptx
non parametric tests.pptxnon parametric tests.pptx
non parametric tests.pptx
SreeLatha98
 
METHOD OF DISPERSION to upload.pptx
METHOD OF DISPERSION to upload.pptxMETHOD OF DISPERSION to upload.pptx
METHOD OF DISPERSION to upload.pptx
SreeLatha98
 

More from SreeLatha98 (11)

POPULATION.pptx
POPULATION.pptxPOPULATION.pptx
POPULATION.pptx
 
research ppt.pptx
research ppt.pptxresearch ppt.pptx
research ppt.pptx
 
non parametric tests.pptx
non parametric tests.pptxnon parametric tests.pptx
non parametric tests.pptx
 
CORRELATION.pptx
CORRELATION.pptxCORRELATION.pptx
CORRELATION.pptx
 
METHOD OF DISPERSION to upload.pptx
METHOD OF DISPERSION to upload.pptxMETHOD OF DISPERSION to upload.pptx
METHOD OF DISPERSION to upload.pptx
 
MEAN.pptx
MEAN.pptxMEAN.pptx
MEAN.pptx
 
MODE.pptx
MODE.pptxMODE.pptx
MODE.pptx
 
MEDIAN.pptx
MEDIAN.pptxMEDIAN.pptx
MEDIAN.pptx
 
FREQUENCY DISTRIBUTION.pptx
FREQUENCY DISTRIBUTION.pptxFREQUENCY DISTRIBUTION.pptx
FREQUENCY DISTRIBUTION.pptx
 
introduction to biostatistics.pptx
introduction to biostatistics.pptxintroduction to biostatistics.pptx
introduction to biostatistics.pptx
 
origin.pptx
origin.pptxorigin.pptx
origin.pptx
 

Recently uploaded

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Recently uploaded (20)

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 

UNIT 4.pptx

  • 1. BIOSTATISTICS AND RESEARCH METHODOLOGY Unit-4: statistical analysis PRESENTED BY Himanshu Rasyara B. Pharmacy IV Year UNDER THE GUIDANCE OF Gangu Sreelatha M.Pharm., (Ph.D) Assistant Professor CMR College of Pharmacy, Hyderabad. email: sreelatha1801@gmail.com
  • 2. • What is Microsoft Excel? • Microsoft Excel is a spreadsheet program used to record and analyse numerical and statistical data. Microsoft Excel provides multiple features to perform various operations like calculations, pivot tables, graph tools, macro programming, etc. • An Excel spreadsheet can be understood as a collection of columns and rows that form a table. Alphabetical letters are usually assigned to columns, and numbers are usually assigned to rows. The point where a column and a row meet is called a cell. MS-EXCEL is a part of Microsoft Office suite software. It is an electronic spreadsheet with numerous rows and columns, used for organizing data, graphically represent data(s), and performing different calculations. It consists of 1048576 rows and 16383 columns, a row and column together make a cell.
  • 3. • Understanding the Ribbon • The ribbon provides shortcuts to commands in Excel. A command is an action that the user performs. An example of a command is creating a new document, printing a documenting, etc. You can perform statistical analysis with the help of Excel. It is used by most of the data scientists who require the understanding of statistical concepts and behavior of the data. But when the data set is huge or you need some specialized data analysis model such as linear or regression, you should go for advanced tools such as Python, R programming. Here, we will go through the basic concept of statistical analysis and will apply the concepts to our own data. Before starting, you need to check whether Excel Analysis Tool Pak is enabled in Excel or not (it is an add- in provided by Microsoft Excel). To check whether it is enabled or not, go to Excel → Data and check whether data analysis option is there or not on the top right corner.
  • 4. • If it is not there, go to Excel → File → Options → Add-in and enable the Analysis Tool Pak by selecting the Excel Add-ins option in manage tab and then, click GO. This will open a small window; select the Analysis Tool Pak option and enable it. • Descriptive Analysis • You can find descriptive analysis by going to Excel→ Data→ Data Analysis → Descriptive statistics. It is the most basic set of analysis that can be performed on any data set. It gives you the general behaviour and pattern of the data. It is helpful when you a have a set of data and want to have the summary of that dataset. This will show the following statistic data for the chosen dataset. • Mean, Standard error and Median • Median, Mode and Standard Deviation • Sample Variance • Kurtosis and Skewness • Range, Minimum, Maximum, Sum and Count • ANOVA (Analysis Of Variance) • It is a data analysis method which shows whether the mean of two or more data set is significantly different from each other or not. In other words, it analyses two or more groups simultaneously and finds out whether any relationship is there among the groups of data set or not. For example, you can use ANOVA if you want to analyze the traffic of three different cities and find out which one is more efficient in handling the traffic (or if there are no significant differences among the traffic). You will find three types of ANOVA in the Excel 1.ANOVA single factor 2.ANOVA two factor with replication 3.ANOVA two factor without replication
  • 5. • Moving Average • Moving average is usually applicable for time series data such as stock price, weather report, attendance in class etc. For example, it is heavily used in stock price as a technical indicator. If you want to predict the stock price of today, the last ten days data would be more relevant than the last 1 year. So, you can plot the moving average of the stock having a 10-day time period and you can then predict the price to some extent. The same applies to the temperature of a city. The recent temperature of a city can be calculated by taking the average of last few weeks rather than previous months. • Regression • Regression is a process of establishing a relationship among many variables. Usually, we establish a relationship between dependent variables and independent variables. For example, cases when you want to see if there is any increase in the revenue of product, which is not due to increase in the advertisement.
  • 6. • Sampling • This option is the data analysis tool which is used for creating samples from a huge population. You can randomly select data from the dataset or select every nth item from the set. For example, if you want to measure the effectiveness of a call center employee in a call center, you can use this tool to randomly select few data every month and listen to their recorded calls and give a rating based on the selected call.
  • 7. SPSS • SPSS (Statistical Package for the Social Sciences) is a versatile and responsive program designed to undertake a range of statistical procedures. SPSS software is widely used in a range of disciplines and is available from all computer pools within the University of South Australia. • SPSS is a Windows based program that can be used to perform data entry and analysis and to create tables and graphs. SPSS is capable of handling large amounts of data and can perform all of the analyses covered in the text and much more. SPSS is commonly used in the Social Sciences and in the business world. Task: Open SPSS Click on the Start menu ( ) > All Programs > IBM SPSS Statistics > IBM SPSS Statistics 21 (or whatever is the latest version number) to pen the SPSS program. • Layout of SPSS The Data Editor window has two views that can be selected from the lower left hand side of the screen. Data View is where you see the data you are using. Variable View is where you can specify the format of your data when you are creating a file or where you can check the format of a pre- existing file. The data in the Data Editor is saved in a file with the extension.sav. • Data view : It is the spreadsheet that is visible when you first open the Data Editor; this sheet contains the data. Unlike MS Excel, formulas and variable names cannot be entered here. Variable view : It contains information about the variables in the data set.
  • 8. Syntax Another important window in the SPSS environment is the Syntax Editor. In earlier versions of SPSS, all of the procedures performed by SPSS were submitted through the use of syntax, which instructed SPSS on how to process your data. Using SPSS syntax allows you access to additional commands which are not available through the menus and dialog boxes, and syntax files can be stored and rerun at a later date, allowing you to repeat an analysis. From the menu in the Data Editor window File >> New >> Syntax
  • 9. Output Viewer When you execute a command for a statistical analysis, regardless of whether you used syntax or dialog boxes, the output will be printed in the Output Viewer. From the menu in the Data Editor window File >> New >> Output
  • 10. DESIGN OF EXPERIMENTS • DOE is an essential tool to ensure products and processes satisfy Quality by Design requirements imposed by regulatory agencies. Using a QbD approach to develop your testing process can help you reduce waste, meet compliance criteria and get to market faster. • DOE helps you create a reliable QbD process for assessing formula robustness, determining critical quality attributes and predicting shelf life by using a few months of historical data. Why Use a Quality By Design Approach? Using a Quality by Design (QbD) approach to develop the testing process and to choose the critical quality attributes for a pharmaceutical product can help to: • Ensure products meet defined critical quality attributes • Meet regulatory compliance criteria • Predict formula robustness • Reduce waste in production • Get to market faster
  • 11. Using DOE to Optimize Processes • When it comes to creating an optimal manufacturing process that limits variation and conserves energy or resources, or a developing a new formula that is most likely to meet customer expectations, design of experiments (DOE) is an indispensable tool. DOE helps you to: • Minimize the number of experiments you have to do to find the ideal formula or recipe • Create a robust process (one that holds up to changes in environment, humidity, ingredient variation, etc.) • Adapt a recipe for changes in ingredients or packaging needs (availability, eco-compliance, regulations, consumer trends, etc.) Using DOE to Predict Formula Robustness • Being able to demonstrate product robustness and deliver the intended quality of the product within allowable ranges for the claimed shelf-life period is critical for pharmaceutical manufacturers. Both international and country specific regulatory agencies, such as the FDA, pay close attention to shelf-life claims. • Predicting formulation robustness requires a careful design of experiments that holds up under statistical analysis. Using DOE for formulation robustness studies can help you select a commercial formulation that is sufficiently robust within the acceptable ranges around the label claim to meet the shelf life stability requirements.
  • 12. Steps to Predict Formula Robustness Step 1: Choose the Right Measurement Factors • Ensure that the factors selected to study can be used to predict an acceptable formulation parameter range where all the values for the assessed quality attributes will be inside the specified limits. Step 2: Design a Statistically Valid Study • Consider how the factors being investigated fit into a full factorial design. For pharma companies, for example, robustness studies must be able to prove that specific critical quality attributes stay within the acceptable ranges for the entire shelf-life period. In addition: • The study must result in a regression model that is statistically significant • The study must provide output parameters (quality attributes) that are within predefined limits Step 3. Analyze the Data Using Multiple Linear Regression • One important way to produce a valid testing model is to use a tool that makes Design of Experiments easier. For example, MODDE® Design of Experiments Software, can help you set up multivariate formulation robustness studies that demonstrate the acceptable ranges of quality for a target composition, define the allowable edges of the composition range, and predict the stability requirements needed to reach the end of shelf life.
  • 13. DOE EduPack is designed to give students hands-on skills to solve problems and learn: • How to create efficient experimental designs to match the objectives • How to analyze data based on sound statistical principals to evaluate results of the experiments • How to interpret results by using graphical and statistical tools • How to convert modeling results into concrete action with MODDE® optimizer & verifying experiments • How to define a design space and find robust setpoints • APPLICATIONS OF DESIGN OF EXPERIMENTS IN QbD AND AQbD Quality by Design approach was accepted by FDA in 2004 and described in ‘pharmaceutical cGMPs for 21st century – a risk-based approach’. • International conference on harmonization (ICH) Q8 pharmaceutical development, Q9 quality risk assessment, and Q10 pharmaceutical quality system provide detailed requirements regarding pharmaceutical product quality. • QbD and DoE approaches help to implement ICH/Q8 and ICH/Q9.
  • 14. • Since QbD approach was accepted by FDA, DoE has been widely employed in order to provide a complete understanding of the product and its manufacturing process. Many applications of DoE used for screening and optimization purposes of pharmaceutical products and their manufacturing processes may be found in the literature. Several input factors (independent variables), such as excipient concentrations, stirring time, stirring speed, temperature, pressure, among other may be screened and optimized using DoE. Studied output responses (dependent variables) included particle size, entrapment efficiency, dissolution rate, among other. • Application of screening designs in pharmaceutical QbD allow to identify the critical material attributes (CMAs) and critical process parameters (CPPs) (independent variables) affecting the critical quality attributes (CQAs) (dependent variables) and, therefore, the quality target product profile (QTPP). In addition, optimizing design and surface response methodology and multiple response optimization allow to define a design space region in which CQAs and QTPP are attended. The adoption of a design space region based on product and process understanding allow regulatory flexibility, because changes within the design space region do not require prior regulatory approval. • Recently, DoE has been used in the rational development and optimization of analytical methods. Culture media composition, mobile phase composition, flow rate, time of incubation are examples of input factors (independent variables) that may the screened and optimized using DoE. Several output responses (dependent variables), such as retention time, resolution between peaks, microbial growth, among other responses were found in literature.
  • 15.
  • 16.
  • 17. MINITAB • Minitab is a statistics package developed at the Pennsylvania State University by researchers Barbara F. Ryan, Thomas A. Ryan, Jr., and Brian L. Joiner in conjunction with Triola Statistics Company in 1972. • It began as a light version of OMNITAB 80, a statistical analysis program by NIST, which was conceived by Joseph Hilsenrath in years 1962-1964 as OMNITAB program for IBM 7090. The documentation for OMNITAB 80 was last published 1986, and there has been no significant development since then. • In 2020, during the COVID-19 pandemic, Minitab LLC requested and received between $5 million and $10 million under the Pay check Protection Program to avoid having to let go 250 employees. As of 2021, Minitab LLC had subsidiaries in the UK, France, Germany, Hong Kong, and Australia. • A statistics package developed to help six sigma professionals analyse and interpret data to help in the business process is called Minitab. The data input is simplified so that it can be easily used for statistical analysis and it also helps in manipulating the dataset. • Key Features of Minitab 1. Basic Statistics: This feature covers all kind of statistical tests, descriptive statistics, correlations, and covariances. 2. Graphics: This enables users to draw various statistical graphs such as scatter plot, histograms, boxplots, matrix plot, marginal plot, bubble charts etc. 3. Regression: This feature enables users to find the relationship between variables (which is a key feature of any statistical tool). Regression is available in form of linear, non-linear, ordinal, nominal etc.
  • 18. 4. Analysis of Variance: Analysis of variance i.e., ANOVA is used to analyse the difference between the group means. 5. Statistical Process Control: This feature helps you create cause and effect diagrams, variable control charts, multi- variate control charts, time-weighted charts, etc. 6. Measurement System Analysis: MSA is a mathematical method to determine the amount of variation that exists within a measurement process. Variability in a process can directly impact the overall variance of a process. 7. Design of Experimentations: This feature helps you identify the cause-and-effect relationship. This helps in creating and experimenting with various designs by noting down all its relevant outputs. This helps you on finalizing a certain method and optimizing it. 8. Reliability/Survival: It enables you to select the best distribution for modelling data. It helps you in identifying which is the best function that best describes your data. • One of the most common methods used in statistical analysis is hypothesis testing. Minitab offers many hypothesis tests, including t-tests and ANOVA (analysis of variance). Usually, when you perform a hypothesis test, you assume an initial claim to be true, and then test this claim using sample data. • Hypothesis tests include two hypotheses (claims), the null hypothesis (H0) and the alternative hypothesis (H1). The null hypothesis is the initial claim and is often specified based on previous research or common knowledge. The alternative hypothesis is what you believe might be true.
  • 19. Perform an ANOVA 1. Choose Stat> ANOVA>One- Way. 2. Choose Response data are in one column for all factor levels. 3. In response, enter Days. In factor, enter Center. 4. Click Comparisons. 5. Under Comparison procedures assuming equal variances, check Tukey.
  • 20. Click OK. 7. Click Graphs. For many statistical commands, Minitab includes graphs that help you interpret the results and assess the validity of statistical assumptions. These graphs are called built-in graphs. 8. Under Data plots, check Interval plot, Individual value plot, and Boxplot of data. 9. Under Residual plots, choose Four in one. 10. Click OK in each dialog box.
  • 21. R-ONLINE R is a language and environment for statistical computing and graphics." • "R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time- series analysis, classification, clustering) and graphical techniques, and is highly extensible." • "One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed.“ Importance of R Programming Language • R is a well-developed, simple, and effective programming language. Which includes conditional loops; user defined recursive functions and input and output facilities. • R provides graphical facilities for data analysis and display. • R is a very flexible language. It does not necessitate that everything should be done in R itself. It allows the use of other tools, like C and C++ if required. • R has an effective data handling and storage facility. • R provides an extensive, coherent, and integrated collection of tools for data analysis.  R also includes a package system that allows the users to add their individual functionality in a manner that is indistinguishable from the core of R. • R is actively used for statistical computing and design. It has brought about revolutionary improvements in big data and data analytics. It is the most widely used language in the world of data science! Some of the big shots in the industry like Google, LinkedIn, and Facebook, rely on R for many of their operations.
  • 22. Programming Features of R • R has various programming features which we will discuss below: 1. Data Inputs and Data Management  Data inputs such as data type, importing data, keyboard typing.  Data management such as data variables, operators. 2. Distributed Computing and R Packages – • Distributed computing is an open-source, high-performance platform for the R language. It splits tasks between multiple processing nodes to reduce execution time and analyse large datasets.  R Packages – R packages are a collection of R functions, compiled code and sample data. By default, R installs a set of packages during installation.  Advantages and Disadvantages of R Programming • There are several benefits and some limitations of the R programming language. Let us discuss them one by one: • Pros of R Language • R is the most comprehensive statistical analysis package, as new technology and ideas often appear first in R. • It is cross-platform which runs on many operating systems. It’s best for GNU/Linux and Microsoft Windows.
  • 23. • In R, everyone is welcomed to provide bug fixes, code enhancements, and new packages. • Cons of R Language  The quality of some packages in R is less than perfect.  There’s no customer support of R Language whom you can complain if something doesn’t work.  R commands hardly concerns over memory management, and so R can consume all the available memory. • USE OF R-PROGRAMMING FOR CLINICAL TRAIL DATAANALYSIS • The use of R programming in clinical trials has not been the most popular and obvious, despite its recent growth over the past few years, its practical use still seems to be hindered by several factors, sometimes due to misunderstandings, (e.g. validation) but also because of a lack of knowledge of its capabilities. Despite these bottlenecks, though, R is doubtlessly creating its own (larger by the day) niche in the pharmaceutical industry. • In this blog we will see how R can be used to create TLFs much like the current combination of PROC REPORT/PROC TABULATE and the ODS currently does, thus showing its power and capability to play an important role in our industry in the years to come, not as a replacement for, but rather as an alternative option to SAS®.
  • 24. USES OF R-PROGRAMMING Although R is a popular language used by many programmers, it is especially effective when used for  Data analysis  Statistical inference  Machine learning algorithms R offers a wide variety of statistics-related libraries and provides a favourable environment for statistical computing and design. In addition, the R programming language gets used by many quantitative analysts as a programming tool since it's useful for data importing and cleaning. As of August 2021, R is one of the top five programming languages of the year, so it’s a favourite among data analysts and research programmers. It’s also used as a fundamental tool for finance, which relies heavily on statistical data. The Popularity of R by Industry Thanks to its versatility, many different industries use the R programming language. Here is a list of disciplines that use the R programming language:  Fintech Companies (financial services)  Academic Research  Government (FDA, National Weather Service)  Retail  Social Media  Data Journalism  Manufacturing  Healthcare