Pandas.pptx

 Pandas is an open-source Python Library providing
high-performance data manipulation and analysis tool
using its powerful data structures. The name Pandas
is derived from the word Panel Data.
 Using Pandas, we can accomplish five typical steps in
the processing and analysis of data, regardless of the
origin of data - load, prepare, manipulate, model,
and analyze.

1.Consists of basic files about the data
structures present within the library.
2.Contains algorithms which provide
basic functionality to the library.
3.Contains input and output tools
which help Pandas handle files of
various file formats.
4.Contains various functions for
manipulating the data sets.

5.Consists of sparse versions of various
data structures means that the data is
mostly missing or unavailable.
6.Various statistics-related functions
can be found in this portion.
7.Various utilities, testing tools,
development can be found here.
8.Using Pandas with both R and
Python can help you to have a much
better grasp over data analysis.

 Fast and efficient DataFrame object with default and customized
indexing.
 Tools for loading data into in-memory data objects from different file
formats.
 Data alignment and integrated handling of missing data.
 Label-based slicing, indexing and subsetting of large data sets.
 Columns from a data structure can be deleted or inserted.
 High performance merging and joining of data.

 Economics – constant demand for data analysis.
 Recommendation Systems – Spotify or Nettflix.
 Stock Prediction - there is a lot of previous data of
stocks which tells us about how they behave.
 Statistics – Pandas help in performing statistical
calculations.

 Series is a one-dimensional labeled array capable
of holding data of any type.
pandas.Series( data, index, dtype, copy)

 A Data frame is a two-dimensional data structure,
i.e., data is aligned in a tabular fashion in rows
and columns.
pandas.DataFrame( data, index, columns, dtype, copy)

 Index is an immutable sequence used for indexing
and alignment.
pandas.Index(data, dtype, copy, name, tupleize_cols)

 Index.values: Return an array representing the data in the
Index.
 Index.is_unique: Return True if the index has unique values
otherwise False.
 Index.has_duplicates: Check if the Index has duplicate
values.
 Index.hasnans: Return True if there are any NaNs.
 Index.size: Return the number of elements in the underlying
data.

 axes - Returns a list of the row axis labels
 dtype - Returns the dtype of the object.
 empty - Returns True if series is empty.
 ndim - Returns the number of dimensions of the
underlying data, by definition 1.
 size - Returns the number of elements in the
underlying data.
 values - Returns the Series as ndarray.
 head(n) - Returns the first n rows.
 tail(n) - Returns the last n rows.

 Reindexing changes the row labels and column
labels of a DataFrame. To reindex means to conform
the data to match a given set of labels along a
particular axis.
 Multiple operations can be accomplished through
indexing like −
 Reorder the existing data to match a new set of labels.
 Insert missing value (NA) markers in label locations where
no data for the label existed.

reindex(keys, method, copy, level, fill_value, limit, tolerance)
 keys - Required. String or list containing row indexes or column
labels.
 method=“None | bfill | ffill”
 copy=True | False
 level=Number | Label
 fill_value=value
 limit=Number
 tolerance

 sort_index(axis)
 sort_values(by=‘column’)

 Pandas Dataframe.rank() method returns a rank
of every respective index of a series passed. The
rank is returned on the basis of position after
sorting.
rank(axis=0, method='average', numeric_only=_No
Default.no_default, na_option='keep', ascending=
True, pct=False)

axis{0 or ‘index’, 1 or ‘columns’}, default 0Index to direct ranking. For Series this parameter is
unused and defaults to 0.
method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’How to rank the group of
records that have the same value (i.e. ties):
average: average rank of the group min: lowest rank in the group
max: highest rank in the group first: ranks assigned in order they appear in the array
dense: like ‘min’, but rank always increases by 1 between groups.
numeric_onlybool, optionalFor DataFrame objects, rank only numeric columns if set to True.
na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’How to rank NaN values:
keep: assign NaN rank to NaN values top: assign lowest rank to NaN values
bottom: assign highest rank to NaN values
ascendingbool, default TrueWhether or not the elements should be ranked in ascending order.
pctbool, default FalseWhether or not to display the returned rankings in percentile form.

 Pandas describe() is used to view some basic
statistical details like percentile, mean, std etc. of
a data frame or a series of numeric values.
DataFrame.describe(percentiles=None, include=No
ne, exclude=None, datetime_is_numeric=False)

 The unique() function in pandas is used to find
the unique values from a series.
Series.unique()

 Pandas Series.value_counts() function return a
Series containing counts of unique values.
Series.value_counts(normalize=False, sort=True, as
cending=False, bins=None, dropna=True)

Handling Missing Data
 pd.isna()
 pd.notna()
Filter Missing Data
 Series.dropna()

Pandas.pptx

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Pandas.pptx

Semelhante a Pandas.pptx (20)

Mais de Govardhan Bhavani

Mais de Govardhan Bhavani (17)

Último

Último (20)

Pandas.pptx