One of the mainstays of a modern software toolkit is Excel 2016, from Microsoft Office 2016. By reputation, Excel is considered a beginner’s tool that self-respecting data analysts would bypass, but Excel is fairly high-powered, can take up to 1.06 million rows of data per set, contains complex statistical analysis capabilities (without the need for scripting), and enables rich data visualizations. It has a number of rich add-ons to empower different analytical and data visualization functionalities. It works as a great bridging tool to more complex types of statistical analyses.
This session walks participants through some basic built-in data visualizations in Excel 2016, including pie charts and doughnuts, bar charts, tree maps and sunburst diagrams, cluster diagrams, spider (radar) charts, scattergraphs, and others. This session will cover how data structures and desired emphases will determine the options for particular data visualizations.
In this session, participants will
review how to load a data table,
read the general data in a data table (or worksheet),
process or clean the data as needed,
use the Recommended Charts feature,
decide which built-in data visualizations to use, and
consider how to add relevant data visualization elements (including data labels, background grids, axis labels, and titles) for a coherent and effective data visualization.
Also, participants will help co-build data visualizations from open-source and other datasets.
2. Presentation Overview
• One of the mainstays of a modern software toolkit is Excel 2016, from
Microsoft Office 2016. By reputation, Excel is considered a beginner’s tool
that self-respecting data analysts would bypass, but Excel is fairly high-
powered, can take up to 1.06 million rows of data per set, contains
complex statistical analysis capabilities (without the need for scripting),
and enables rich data visualizations. It has a number of rich add-ons to
empower different analytical and data visualization functionalities. It
works as a great bridging tool to more complex types of statistical analyses.
• This session walks participants through some basic built-in data
visualizations in Excel 2016, including pie charts and doughnuts, bar charts,
tree maps and sunburst diagrams, cluster diagrams, spider (radar) charts,
scattergraphs, and others. This session will cover how data structures and
desired emphases will determine the options for particular data
visualizations.
2
3. Presentation Overview(cont.)
• In this session, participants will
• review how to load a data table,
• read the general data in a data table (or worksheet),
• process or clean the data as needed,
• use the Recommended Charts feature,
• decide which built-in data visualizations to use, and
• consider how to add relevant data visualization elements (including data
labels, background grids, axis labels, and titles) for a coherent and effective
data visualization.
• Also, participants will help co-build data visualizations from open-
source and other datasets.
3
4. Presentation Order
• Sourcing Datasets
• Reading General Data in a Data Table / Worksheet
• Processing or Cleaning Data
• Using the Recommended Charts Feature in Excel 2016
• Selecting Data Visualization Types
• Column, line, pie, bar, area, X Y (scatter), stock, surface, radar, treemap,
sunburst, histogram, box & whisker, waterfall, & combo
• Going “Off-Script” within Excel
• Some Common Mistakes
4
5. Presentation Order (cont.)
• Adding Relevant Data Visualization Elements
• Processing Graph Visualizations Outside of Excel 2016
• Add-ins to Excel 2016
• Streamgraphs, #hashtag networks on microblogging sites (on Twitter), related
tags networks (on Flickr), article-article networks on MediaWiki (on
Wikipedia), and others
• Data Visualization Standards
• A Note about Data
5
7. Sourcing Datasets
• Downloading public datasets from sites like data.gov
• Capturing the back data about how the publically-released data sets were
released
• Extracting data from online data portals (research sites, survey sites,
learning management systems, social media platforms, and others)
and converting those files into something readable in databases and
Excel
7
8. Sourcing Datasets (cont.)
• Downloading data from social media platforms
• These may include Facebook poststreams, Twitter tweetstreams, scraped
images from the Web and any image sharing sites, scraped videos from the
Web and video-sharing sites, articles from Wikipedia, #hashtag networks from
Twitter, keyword networks from Twitter, related tags networks from Flickr,
email networks from email systems, and others
• These datasets include both structured and semi-structured data
8
9. Sourcing Datasets (cont.)
• Autogenerating data…
• From online research suites (often used to test surveys)
• From graph visualization tools (to see what randomized graphs look like)
• Creating datasets in other software programs and saving out in a file
format readable by Excel
• Creating data manually in an Excel work sheet
• Capturing data in Excel using third-party data downloaders, and
others
9
10. Data Analytics Suites
• Some datasets may be exported from data analytics suites.
• SPSS, RapidMiner Studio, R, Python, and other tools may be used for
high-level statistical analysis and machine learning. However, the
data visualization tools may be more focused on conveying data than
in presentation-quality data visualizations.
• The underlying data may be exported in a form that Excel can use…in
order to create the data visualizations. (Excel has a lot of analytics
capabilities built-in, too, but complex analytics likely require
processing in other software programs.)
10
11. Data Capture and Pre-Processing
• Prior to importing data into Excel, it is likely that the data is pre-processed /
cleaned for accuracy.
• All data are changed with every touch of technology:
• Software may be used to extract or capture data (such as from social media
platforms). There are limits to APIs, which are virtually all limited by rate and by
amount of data capturable for free.
• Software may be used to convert manual coding into digital coding (transcoding).
• Software may be used to turn unstructured and semi-structured data into
quantitative-based data tables (such as text analytics applications). The reverse is
common, too: taking quantitative data and turning it semi-structured (as visuals).
• Software may be used to create synthetic or faux data that meets particular
requirements (such as a random network graph).
11
12. Data Everywhere and Fungible
• In other words, it is possible to datafy a lot of things.
• There is data everywhere…
• It is possible to turn most data into information and something
somewhat useful.
12
14. Structured Data
Structured data is labeled by row and column
headers
Such data is categorize-able by type and
common characteristics and functions
Rows tend to be data records (with unique
identifiers in Column A)
Columns tend to be variables and attributes
Data types include the following: General,
Number, Currency, Accounting, Data, Time,
Percentage, Fraction, Scientific, Text, Special,
and Custom
Each cell is labeled by data types, and these
types affect how the software handles the
data
14
15. “Unstructured” or
“Semi-Structured”
Data
Text sets, bags of words
Image files
Audio files
Video files
Multimedia, and others
• Tend to be multi-dimensional and / or high-
dimensional data
• Tend to be somewhat inherently structured
based on the data type (language has some
inherent structure; imagery may be defined
within 2D or 3D space, etc.), thus the
preference for “semi-structured” for word
purists
• Tend to be various file types, with different
file extensions
15
16. Basic “Structured” Data Structures
• Column A tends to contain unique identifiers for the row data
• Row 1 tends to contain all the column headers
• Column headers tend to be written in CamelCase format
• Each row of the row data except the first row contains an individual
record
• Each column contains a variable each
• Each column tends to contain data of a certain type, such as string /
text, numerical, percentage, date, and others
• Some of the data is human readable, and some is not (based on size
of data #### or type)
16
17. Coding “Structured” Data
• Structured data generally has a long history of conventional statistical
approaches to analysis, to identify patterns in the data.
• There are simple counts.
• There are measures of central tendency for parametric datasets.
• There are tools for observing and measuring associations.
• There are tools for observing and measuring causation-based associations.
• There are tools to compare observed data vs. expected data, and measures of
statistical significance.
• There are tools to support experimental setups, to compare control groups
with experimental groups.
• There are tools to measure confidence in statistical findings.
17
18. Basic “Unstructured” and “Semi-Structured”
Data Structures
• Language data tends to have an inherent structure based on how
evolved languages originate and change over time.
• Image data tends to have an inherent structure based on image
features: image sizes, orientation, main subject matter, colors,
resolution, and other factors.
• Audio data tends to have an inherent structure by voiceprint (and / or
waveform), occurrences in time, sound frequencies, and other
factors.
• Video data tends to have an inherent structure by frames-per-minute
imagery, waveforms, and other factors.
18
19. Coding “Unstructured” and “Semi-Structured”
Data
• So-called “unstructured” or “semi-structured” data are coded in a
variety of different ways.
• One approach is with a priori coding, or using an extant model, conceptual
framework, or other structure to create a codebook against which the data
are coded.
• Another general approach is with “emergent” coding, which starts with the
raw data and results in an evolved codebook.
• Then, there are many combinations of the two above approaches.
19
20. Coding “Unstructured” and “Semi-Structured”
Data (cont.)
• Such unstructured / semi-structured data are multi-dimensional, so
they can be analyzed in a variety of different ways and are somewhat
robust against having a certain interpretation stick and predominate
over others.
• Data are generally polysemous or multi-meaninged.
• There are public text corpora that have been created for broad-scale
use in the testing of software tools, programs, algorithms, and
processes for text analysis, in order to be able to have comparable
and competitive analyses.
20
21. Coding “Unstructured” and “Semi-Structured”
Data (cont.)
• There are some text corpora which are non-consumptive, which
means only the top-level statistics and other metrics about a text set
are available, but the underlying texts (the actual data) themselves
are not. “Shadow” datasets are made accessible for the queries, but
to avoid the risk of re-identification of original copyrighted
manuscripts, the original manuscripts in their original order are not
made available. (Google Books Ngram Viewer is a well known
example.)
21
22. Coding “Unstructured” and “Semi-Structured”
Data (cont.)
• Such data may be coded by humans alone, computer alone, or a
cyborg-ian mix
• Advances in computer vision (object identification, sentiment analysis
of images, predictivity of “what happens next” in a video sequence)
and other capabilities have extended computer capabilities at coding
such data
22
26. Unlinked or Linked Data Tables
Flat Files
• Data tables treated as single
stand-alone files that may be
assessed alone or queried in
relation to other files
Linked Files
• Data tables treated as
interconnected and related files
that may be queried across data
tables and fields
26
28. Some Common Questions for Data Processing
or Cleaning
Structured Data
• How should missing data be
handled? (Should empty cells
mean deleting the whole record?
Should empty cells be filled with
N/A? Should empty cells be filled
with randomly-generated contents
based on the other data in the set?
Should empty cells be zeroed out?)
• How should repeated data be
handled?
Unstructured or Semi-Structured Data
• How should scraped imagery that
consists of a corrupted file be
handled? Should these be omitted?
Should these be kept and partially
coded?
• In an image set, how should
different versions of an image be
coded? Should that be counted
multiple times? What if the image
is re-inscribed and reused by
others in new ways?
28
29. Some Common Questions for Data Processing
or Cleaning (cont.)
Structured Data
• In a set of parametric data, how
should extreme outliers be
handled? Should they be
omitted, so has not to skew a
curve? Should they be treated
differently than the other data in
the set?
Unstructured or Semi-Structured Data
• In a multi-lingual text set, how
should all the other languages
besides the non-base language
be used? Should these language
inputs be manually handled?
Should these non-base language
inputs be translated to the base
language for machine analysis?
29
30. Some Common Questions for Data Processing
or Cleaning (cont.)
Structured Data
• There may be benefits to
combining multiple open-source
datasets, which each have insights
to contribute to the study of a
particular issue. The variables are
not exactly mappable to each
other though. How should such
datasets be melded? How should
the mixed dataset be described?
How should the original datasets
be credited?
Unstructured or Semi-Structured Data
• In a mixed multi-modal dataset
of various multimedia contents,
there is a lot of room for
interpretation and subjectivity.
What tools should be designed
to aid in creating consistency in
the coding and interpretations?
30
31. Some Common Questions for Data Processing
or Cleaning (cont.)
Structured Data
• The original labeling of online
data from an online research
suite is too verbose and
complex. Renaming the
variables is important to enable
easier data processing and
easier setup of data tables.
What is a legitimate process of
renaming variables for accuracy
and efficiency?
Unstructured or Semi-Structured Data
• Machine coding enables faster
processing of various types of
unstructured and semi-
structured data. However, the
machine coding also introduces
some degree of ambiguity and
“noise”? How should the use of
computers to code be balanced
against human-based insights?
31
32. Some Common Questions for Data Processing
or Cleaning (cont.)
Structured Data
• In combining manual coding for
a team coded project, there are
some new codes that were not
part of the original codebook.
Should these new nodes be
included in the similarity
analysis computation for a
Cohen’s Kappa / Kappa
Coefficient?
Unstructured or Semi-Structured Data
• In a particular study, there is a
set of videos that has been
hacked and taken from a
company. The videos are
relevant to the research and
would offer value, but they are
not legally available. Should
these videos be used, or should
they be expunged from the
study set?
32
33. Some Common Questions for Data Processing
or Cleaning (cont.)
Structured Data
• An online survey system has
accidentally captured
respondent identifier
information during the normal
course of the data capture. The
demographic data may be used
for deeper analytics. Should this
data be used? How so? Why or
why not?
Unstructured or Semi-Structured Data
• Scraped online data come from a
variety of sources, and the source
citations may be hard to find.
There are some tools that enable
reverse image searches, but other
search tools are more painstaking
to use, particularly for video
searches. How much effort should
be put into having proper and
correct citations for the original
sources?
33
34. Some Common Questions for Data Processing
or Cleaning (cont.)
Structured Data
• In many fields, original datasets
have to be published out and
shared at the time of publication.
In the process of releasing data, a
researcher has to go through a
process of de-identification…and
has to work hard to ensure that
the data may not be re-identified.
How much due diligence should a
researcher go through to protect
the participants of his / her / their
study?
Unstructured or Semi-Structured Data
• There are auto- machine-created
transcripts available for videos hosted
on a social video sharing sites. The
transcripts are sometimes improved
on with human coding, in some cases,
but in many cases, the transcripts are
not directly fixed and so include
various mistakes. Should these
transcripts be corrected first before
they are coded for research? Or
should they go in, mistakes and all,
even if this means that some garble is
included?
34
35. Some Common Questions for Data Processing
or Cleaning (cont.)
Structured Data
• In some cases, conceptual data
may be applied to communicate
theories, models, and
frameworks. Also, there may be
synthetic or faux data. How
should one communicate the
fact that this data is conceptual?
Unstructured or Semi-Structured Data
• In an image set, there will
images of various types: photos,
screen grabs of virtual worlds,
screen stills of videos, diagrams,
drawings, scans of documents,
and other types of visuals. How
should the various types of
modalities be addressed?
35
36. General Points about Data Processing /
Cleaning
• There should be clear principles and rationales for how data is
handled. These should be clearly documented.
• Generally, data processing should not be lossy (lose information).
• Data processing should be selective but non-destructive.
• Data processing should not result in undue skew or bias to the
original data.
• Data processing should not result in data leakage, confidentiality
compromises, re-identification of research participants, or any
compromise of data privacy .
36
37. General Points about Data Processing /
Cleaning(cont.)
• There should be clear steps and processes applied to data processing,
and these should be documented. If there are deviations to this
processing, that should be recorded as well.
• A raw set of all data should be preserved in its initial pristine state
before any data processing or data cleaning is done. This is to ensure
that there is a pristine master set against which to re-extract a new
set for other processing…and also a master against which to compare
cleaned datasets.
• If this data processing is done by machine, the “macros” should be
documented.
37
39. Accessing the “Recommended Charts”
Feature
• To access this feature, highlight the desired data to map (from the
dataset), and click the “Insert” tab, and select “Recommended
Charts.”
39
40. About the “Recommended Charts” Feature
• This “recommended charts” feature offers some cognitive scaffolding
to new data visualizers by suggesting possible data visualizations.
• This feature seems to offer the most simple options first, even after a
user may have gone with more complex data visualizations for
similarly structured data.
40
41. Selecting the Right Amount of Data to Map
• Every selected cell of data—even the empty ones—contain meaning in the data
visualization, and they will be represented there.
• The selected data should be structurally coherent.
• In other words, the positioning of the respective cells should convey to the software how the data
visualization should be drawn. Part of data preparation involves the positioning of the data in a
correct structure.
• It is possible to transpose elements on an axis and make other changes once the image is drawn,
but it’s preferable to have the data structured correctly.
• The selected variables in a dataset should be interconnected. If the data is not
interconnected somehow—by meaning or by potential association—then it would be
harder to justify having the same information in the same visualization.
• Too much data will mean that Excel cannot draw the graph. Too little information will
mean that the data visualization is not clear.
• Data labels are usually handled, in part, in the data table itself. Those should be correctly
set up, with proper spelling, proper capitalization, proper CamelCase (if used), and
parallel construction.
41
42. Accessing the “All Charts” Feature
• There is one tab that offers some “Recommended Charts”. The tab
next to it offers “All Charts.”
• Both are interactive and selectable.
42
43. The “Recommended Charts” Feature
• This Excel feature assesses the types of data in the dataset or
worksheet and proposes a few data visualizations that may best
represent that data.
• Sometimes, one needs to restart the software to get this to work.
• Some other software tools (like IBM Watson) will actually
preliminarily analyze the data and suggest aspects of the data to focus
on for human analysts.
43
44. The “Recommended Charts” Feature (cont.)
• If too much data have been highlighted, then a message will be
shown. It will read in part “Recommendations are not available for
the data you selected. To choose a chart type, click All Charts.”
• Some reasons why a chart may not be identifiable include the
following:
• no numbers that are summarizable
• data from multiple worksheets
• numerous number of data cells
• contains defined names (“range names”), columns defined as variables with
particular characteristics
• such as combinations of various columns linked by mathematical functions for a new
variable
44
45. The “Recommended Charts” Feature (cont.)
• This feature is a generalized one and does not include deep or unique
or insider insights about the underlying data.
• This means that the suggestions made may not be optimal for the dataset or
the context of the researcher.
• Researcher objectives will also affect the selection of the optimal data
visualizations (and data visualization sequences). To access this feature,
highlight the desired data to map (from the dataset), and click the “Insert”
tab, and select “Recommended Charts.”
45
51. Abstracting Core Descriptive Functions in
Data Visualizations
• Proportionality (“intensity”)
• Frequency counts
• Pie charts, bar charts, intensity
matrices, area charts, radar
diagrams, histograms
• Changes over time
• Frequency counts over time
• Line graphs, scattergraphs
• Hierarchical relationships
• Word networks, frequency word
counts, topic modeling in word
sets (text corpora)
• Word network graphs,
dendrograms, sunburst diagrams
• Descriptive statistics
• Distribution; central tendency
(mean, median, mode), dispersion
(standard deviation, min-max
range, variance)
• Bar charts, curves
51
52. Abstracting Core Descriptive Functions in
Data Visualizations (cont.)
• Social relationships
• Intercommunications, follower-
followee relationship
• Social networks
• Physical-spatial relationships
• Events occurring in space,
locations
• Geographical maps
52
53. Abstracting Core Analytical Functions in Data
Visualizations: Deductive, Inductive, Inferential
• Data relationships
• Associations, causations
• Scattergraphs, line graphs, line
plots
• Text analysis
• Word frequency counts, text
queries, topic modeling
(unsupervised theme extraction),
sentiment analysis
• Cluster diagram, word cloud, word
tree, dendrogram, bar chart,
intensity matrix
53
54. Filtering Data
• The “Sort & Filter” option enables users to select a column or
segment of a column to alphabetize or sort numerically (from most to
least, from least to most) or sort by date (from most-recent to least-
recent, or least-recent to most-recent), and so on.
54
55. Filtering Data (cont.)
• A “Sort Warning” window asks whether the user wants to “Expand
the selection” or just “Continue with the current selection”
• Generally, the selection should be expanded. This means that the entire row
of data will move with however the selection moves. The data will still be
correct and of-a-piece.
• If not, only the selection will be sorted, and all the other row data will be left
in their prior positions. If there is a very limited issue that is being addressed
and the whole dataset is pristine and accurate elsewhere, then just
continuing with the current selection may be the right choice.
55
57. What Follows
• In the following section are the main types of data graphs enabled in
Excel with its built-in charting features.
• Each section begins with the type of chart and some general
characteristics, followed by examples.
• A majority of the examples are drawn with open-source real-world data. One
data visualization was created using synthetic data for effects, and that
visualization has been labeled as being created using faux data.
• The data may have been processed using other tools, but the graphs
themselves were all created in Excel.
• On the same slide as the graph or directly after each graph is a table
with the underlying data, to help viewers understand the connection
between the data and the visualization.
57
58. Column Charts
• Column graphs may tend to be vertical (vs. horizontal).
• In other words, they tend to align with the placement of a column.
• Column graphs may be 2D or 3D.
• The common shapes representing data are rectangles.
• Columns may be stacked.
• Related columns may be clustered.
• Columns may be summed to 100% in “100% stacked column (or bar)
charts.”
58
60. Data Structure for the Vertical Column or Bar
Chart on the Prior Slide
pronou
n
ppron i we you shehe they ipron article prep auxver
b
adverb conj negate
2.47 0.36 0.07 0.08 0.07 0.00 0.15 2.10 5.77 10.78 3.74 1.69 4.07 0.52
60
62. Data Structure for the Stacked Bar Chart in
the Prior Slide
Very Negative Moderately
Negative
Moderately Positive Very Positive
SpaceX Public
Group FB
210 305 389 245
Tesla Motors Club
FB
12 20 45 23
62
63. 63
Group Selfies Dronies
Babies 7 3
Children 63 21
Teens 20 9
20s and 30s 943 168
40s 49 10
50s 25 7
60s 12 2
70s and older 1 4
Mixed Age 224 116
Unknowable 27 185
64. Line Charts
• Line graphs tend to be horizontal
• Line graphs may represent changes over time
• In such cases, time is represented on the x-axis, and some variable with a
numerical measure (counts, percentages, frequencies, intensities) is
represented in the y-axis
• Time units should be consistent
• Line graphs with time on the x-axis may be enhanced with a drawn
“trendline” to indicate directionality of the phenomena over the
studied / observed time frame and into the near future.
• Comparative line graphs may show multiple related factors (variables)
interacting over time with each other
64
65. Line Charts (cont.)
• Line graphs may have two different variables with one represented on the
x-axis and one on the y
• The line itself then may show some association between the two variables (which
should be continuous variables)
• The associations may be negative or positive
• The associations may be more complex and curvilinear (not staying consistent one way or
another over time)
• Or there may be no apparent association
• Where a bar graph (the prior one) suggests a discretization (and “space”)
between the bar elements, a line graph suggests more nuance and some
continuity in the variables (and less space or no space between variables,
expressed as a dotted line or a continuous line).
65
68. 68
Very Negative Moderately Negative Moderately Positive Very Positive
SpaceX Public Group FB
210 305 399 245
Tesla Motors Club FB
12 20 45 23
69. Pie Charts
• Pie charts are used to represent (related) proportions of a whole.
Proportions are determined numerically—by raw counts or
percentages, usually.
• The respective proportions are represented as “slices.”
• Pie charts may be 2D or 3D.
• One version of a “pie chart” is a doughnut, which is a circular
proportional representation.
• “Exploding” pie charts have sections that are pulled out from the
main pie as a point-of-emphasis.
69
73. Bar Charts
• Bar charts use rectangular shapes (and the sizes of these shapes) to
indicate quantities and intensities.
• Bar charts may have bars be either horizontal or vertical.
• Bar charts may be 2D or 3D.
• The bars may be stacked; they may be clustered.
• Bars may be summed to 100% in “100% stacked column (or bar)
charts.”
• Note: The bar chart types are as follows: vertical stacked bar chart,
100% stacked horizontal bar chart, and a Pareto chart.
73
76. Data Structure for the 100% Bar Chart on the
Prior Slide
Very Negative Moderately
Negative
Moderately Positive Very Positive
SpaceX Public
Group FB
210 305 399 245
Tesla Motors Club
FB
12 20 45 23
76
78. Data Structure and Source for Stacked Bar
Charts in Prior Slide
• Data from “Comparative Analysis of 4-H Enrollment and U.S. Census
School Data”
• conditional data distribution
• REEIS Report
• July 2010
• 4H38-Comparitive(sic)-Analysis-of-4H-Enrolment(sic)-US-census-
school-grade-data.xlsx
78
Region Name:<All> State Name:<All>
Kindergarten 1st Grade 2nd Grade 3rd Grade 4th Grade 5th Grade 6th Grade 7th Grade 8th Grade 9th Grade 10th Grade 11th Grade 12th Grade
4-H Enrollment 3,091,210 3,090,230 3,733,658 5,327,058 6,851,803 6,383,579 4,522,690 2,955,880 2,465,910 1,625,339 1,444,406 1,258,010 989,248
US Census 27,624,237 27,945,407 28,086,468 27,494,293 28,503,328 28,461,965 28,102,045 27,757,994 27,640,711 27,576,100 27,584,713 27,644,003 27,744,150
79. 79
Main Time Zones
Alaska 3
Arizona 7
Asuncion 1
Auckland 1
Baghdad 1
Bangkok 1
Beijing 1
Bogota 1
Brasilia 2
Bucharest 1
Central America 1
Central Time (US &
Canada) 34765
Copenhagen 1
Eastern Time (US &
Canada) 130
Harare 1
Hawaii 3
Indiana (East) 2
Islamabad 1
Jerusalem 1
La Paz 1
London 1
Mid-Atlantic 1
Moscow 1
Mountain Time (US &
Canada) 59
Nairobi 1
Pacific Time (US &
Canada) 46
Paris 1
Rome 3
Seoul 2
Tokyo 1
35041
80. Area Charts
• Area charts are built from line charts.
• In these, the areas under the respective lines are filled in with certain
colors and / or textures to indicate quantitative data (frequencies,
amounts, intensities, etc.).
• The data may be comprised of one record or multiple comparable
records.
• The areas are usually somewhat transparent to enable visualizing
other related data records to enable comparisons of quantities.
80
86. Data Structure and
Source for Area Chart
in Prior Slide
Comparison of search frequencies for “selfie”
and “selfie guy” on Google Search from 2004
– 2017 (June)
Selection of two columns of less-correlated
less co-varying web search activity data from
the related downloadable .csv file
Correlations are over time with normalized
data (z-scores) over weekly and monthly
intervals in the time period
Extracted from Google Correlate data
86
87. X Y (Scatter) Charts
• Scatter graphs (aka scatter plots or scatter diagrams) capture two sets
of point data.
• On the respective x-axis and y-axis, different variables are
represented. These variables are often continuous (vs. discrete) ones.
• Sometimes, lines are drawn through the data to help in visualizing
positive associations (the increase in one results in the increase in the
other), negative associations (the increase in one results in the
decrease in the other), no relations, or curvilinear relations (more
complex associations than linear ones).
• Of course, correlations do not mean causation per se.
87
93. Stock Charts
• Stock graphs (sometimes referred to as OHLC or “open high low
chart”) show the ups and downs in stock valuations over time.
• Stock graphs are sometimes referred to as “OHLC” because the
structure is as follows: identifier (whether stock or date or some
other identifier), open, high, low, and close.
• The open is the valuation of a stock at the open of the stock session. The high
describes the highest value of the stock in the day-long trading period. The
low refers to the lowest value of the stock in the trading period. The close
defines the closing value in that time period.
93
94. Stock Charts(cont.)
• The three examples were created from the online Nasdaq historical
data site. Their “quotes” tab enables access to historical prices of
stocks, and only recent datasets were used for the following: The
Boeing Company (BA), Alphabet, Inc. (GOOG), and Tesla, Inc. (TSLA).
• Because all the visualizations are from a single source and of a type,
variance was introduced by variations in Excel for this graph type.
• http://www.nasdaq.com/symbol/ba/historical
• http://www.nasdaq.com/symbol/goog/historical
• http://www.nasdaq.com/symbol/tsla/historical
94
98. Surface Charts
• Surface graphs are 3-dimensional (3D) graphs with x, y, and z axes.
• The setup for a surface graph requires some early data processing,
not just three sets of data.
• The assumption behind the data in a surface graph is that x and y are
independent variables, and the values should be numeric.
98
99. Surface Charts(cont.)
The data should be structured as a matrix or
what some call a “mesh” because this
information will be the underlying data
behind the 3D contour.
To build this mesh, the x-axis should be one
row of data, and the y-axis should be one
column of data. The z axis (which is the
height of various points of the mesh) is
drawn as an intersection between x and y (in
the green area).
Sparse matrices (those with a lot of empty
cells or null values or zeroes) do not work as
well as fully defined ones.
Note that the referent in each cell has to be
back to the y-column and the x-row
($column letter and $row number)
99
100. Surface Charts (cont.)
• Surface charts enable the visualizing of some interaction between the
data represented in the x-axis and the y-axis.
• The colors of the surface chart (represented as bands) represent
similar values.
• 3D surface charts may be depicted as wireframe contours, aerial view
contour charts, and others.
• 3D surface charts may be viewed to see overall data patterns. They
may be used to visualize equations. They may be used to find
optimum combinations between two sets of data (represented on the
x and y axes).
100
101. Surface Charts (cont.)
• 3D visualizations are difficult for people to use because data may be
occluded or difficult to see.
• Data labels are important; legends are important.
• The positioning of the visualization is important.
• The labeling of the three axes is important, so people know what is
represented.
• Excel enables all the above.
• The background behind the data and how the 3D data visualization
was arrived at will be important to help users contextualize the
visualization.
101
107. Radar Charts
• Radar graphs, also known as spider graphs / charts, show quantitative
measures on axes emanating from a center point.
• Each axis represents a variable.
• In total, the radar graph represents a dataset on multi-variate features.
• Radar graphs may be used to compare multiple underlying datasets,
assuming that these are somehow comparable.
107
112. Treemap Charts
• Treemap diagrams are rectangular diagrams which convey frequency
in terms of spatial area of smaller rectangles fitted inside the space.
• Treemap diagrams, if they include nested rectangles within the larger
rectangles, are hierarchy charts because they capture the
relationships of the higher vs. the lower levels.
• By convention, the largest rectangles (indicating highest counts by
category) are to the left, and the smallest are to the right.
112
114. 114
Word Count
9465008123 5035
amazon 2916
2017 2861
https 2712
com 2063
just 873
like 783
get 771
1015484624218312
4
734
one 712
order 698
time 650
now 637
company 568
please 541
prime 489
amzn 474
www 473
day 458
1015525727629433
9
444
know 430
united 402
see 400
states 400
new 396
delivery 389
http 383
seattle 372
customer 369
status 364
even 360
sorry 359
retail 358
122 355
service 355
33207 350
116. Sunburst Charts
• Sunburst diagrams originated from piecharts. In sunburst diagrams,
variables are depicted as portions of a circular ring.
• Sunbursts are a form of hierarchical chart, which show upper and
lower level interrelationships between elements, such as topics and
sub-topics.
• The elements closest inside the circle are the top-level topics. Farther
out are the sub-topics, sub-sub-topics, and so on. (Or, some may
prefer child topics, grandchild topics, great grandchild topics.) It’s
the differentiation between the levels of information that makes this
a hierarchical chart.
116
118. Data Structure of the
Sunburst Diagram in
the Prior Slide
Nodes Sub-nodes No. Coding
References
account account access 7
account account business
days
4
account account details 3
account account info 2
account account information 9
account account issues 3
account account specialist 12
account account specialist
email
3
account account today 1
account amazon associate
account
3
account bank account 8
account checking account 1
account createspace account 2
account email account 26
Note the hierarchy with the “nodes” and
“sub-nodes”.
Note the alphabetization in both text (string)
columns.
Note the frequency counts in the “No.
Coding References” column.
118
119. 119
Name Sources References
beautiful 1 782
day 1 4
employment 1 8
event 1 8
everyone 1 4
flags 1 5
friendly reminder 1 12
good 1 256
great photos 1 175
holiday 1 14
holiday festivities 1 7
holiday lights 1 4
home 1 162
job 1 8
listing 1 7
morning 1 4
offices 1 10
online 1 5
photo 2 453
picture 1 384
place 1 414
post 1 13
road 1 328
state 2 184
state offices 1 8
sunset 1 183
today 1 16
town 1 203
trip 1 361
120. 120
✔ ✔ apps 1
✔ ✔ game 1
✔ ✔ income jaction 1
✔ ✔play store 1
delivery date estimated delivery date 8
delivery date false delivery date 2
delivery delivery persons 2
delivery delivery service 2
delivery delivery vehicle 2
delivery estimated delivery date 8
delivery fake delivery log 5
delivery false delivery date 2
delivery outsourced delivery 1
delivery perfect delivery performance 2
delivery poor delivery experience 1
gift gift cards 2
gift great client gift 1
mail mail box 2
mail mail room 3
mail provided prayer rooms 1
office apartments office 1
office post office 2
order confirmation e-mail order confirmation e-mail 11
order current order isnt 3
order order confirmation e-mail 11
order order status 1
service delivery service 2
service design services 2
service seller support service 1
shipping amzl shipping 2
shipping day shipping 1
shipping free shipping 1
121. Histogram Charts
• Histogram charts shows the frequency distribution of numerical data
over the comprehensive range of possible values. These are counts of
how many times a certain score appears.
• As such, they give a sense of the density of the data.
• Histograms are generally applied to continuous data. For categorical
data, regular bar charts with spaces between the bars are often used.
121
126. Data Structure for the Theme Histogram in
the Prior Slide
126
A :
compan
y
B :
engine
C :
engineer
ing
D :
landing
E :
launch
F :
mission
G : pad H : real I : rocket J : space
K :
spacex
L : stage
M :
station
N :
system
O : test P : time Q : units
R :
vehicle
S : work
1 :
Int
ern
als
(1)
Spa
ceX
43 66 34 39 142 30 37 29 105 101 51 37 31 49 50 35 27 34 43
127. Box & Whisker Charts
• Box and whisker diagrams enable the visualization of groups of numerical
data in quartiles (data broken into 25% or one-fourth segments). The
boxes in the boxplots show the range of values in quartiles for that
variable.
• The whiskers—or the lines running from the boxes—show the variability
outside the upper and lower quartiles. The longer the lines, the greater
the variability above the quartile ranges.
• The data mapped in box plots are not assumed to be parametric, so there
is no assumption of underlying statistical distributions.
• Lines within the boxes may indicate the median or midpoint where half the
data is above and half the data is below.
127
128. Box & Whisker Charts(cont.)
• Skewness shows what the tendency is so whether there are more
scores that trend high or trend low.
• A short box means low dispersion or spread (not a large variety in
numbers)…while a long box means high dispersion or spread (a large
variety of numbers).
• Outliers are indicated as dots outside the boxes and on the whiskers.
• The boxes in boxes & whisker diagrams may be vertical or horizontal.
128
132. Data Structure for the Box & Whisker Plot in
the Prior Slide (partial snippet)
132
Hospital
Referral
Region
Descriptio
n
Total
Discharges
Average Covered
Charges Average Total Payments Average Medicare Payments
AL -
Dothan 91 $32,963.07 $5,777.24 $4,763.73
AL -
Birmingha
m 14 $15,131.85 $5,787.57 $4,976.71
AL -
Birmingha
m 24 $37,560.37 $5,434.95 $4,453.79
AL -
Birmingha
m 25 $13,998.28 $5,417.56 $4,129.16
AL -
Birmingha
m 18 $31,633.27 $5,658.33 $4,851.44
AL -
Montgom
ery 67 $16,920.79 $6,653.80 $5,374.14
AL -
Birmingha
m 51 $11,977.13 $5,834.74 $4,761.41
AL -
Birmingha
m 32 $35,841.09 $8,031.12 $5,858.50
AL -
Huntsville 135 $28,523.39 $6,113.38 $5,228.40
AL -
Birmingha
m 34 $75,233.38 $5,541.05 $4,386.94
AL -
Birmingha
m 14 $67,327.92 $5,461.57 $4,493.57
AL -
Dothan 45 $39,607.28 $5,356.28 $4,408.20
AL -
Birmingha
m 43 $22,862.23 $5,374.65 $4,186.02
AL -
Birmingha
m 21 $31,110.85 $5,366.23 $4,376.23
AL -
Mobile 15 $25,411.33 $5,282.93 $4,383.73
AL -
Huntsville 27 $9,234.51 $5,676.55 $4,509.11
AL -
Mobile 27 $15,895.85 $5,930.11 $3,972.85
AL -
Tuscaloos
a 31 $19,721.16 $6,192.54 $5,179.38
AL -
Mobile 18 $10,710.88 $4,968.00 $3,898.88
AL -
Birmingha
m 33 $51,343.75 $5,996.00 $4,962.45
AL -
Birmingha
m 29 $55,219.31 $5,710.31 $4,471.68
AL -
Mobile 66 $14,948.15 $5,550.90 $4,219.90
AL -
Birmingha
m 19 $73,846.21 $4,987.26 $3,944.42
AK -
Anchorage 23 $34,805.13 $8,401.95 $6,413.78
AZ -
Phoenix 11 $34,803.81 $7,768.90 $6,951.45
AZ -
Tucson 40 $24,474.75 $6,799.85 $5,764.87
AZ -
Phoenix 18 $28,571.61 $9,133.00 $8,008.11
AZ -
Tucson 12 $35,968.50 $6,506.50 $5,379.83
AZ -
Tucson 42 $26,294.52 $6,083.42 $4,903.33
AZ -
Phoenix 28 $26,771.78 $7,140.85 $6,133.57
AZ -
Phoenix 20 $29,967.80 $6,978.75 $5,969.55
AZ -
Phoenix 15 $27,349.40 $11,026.33 $9,056.06
AZ -
Phoenix 18 $59,443.83 $8,487.44 $7,422.66
133. Waterfall Charts
• Waterfall diagrams (aka “flying bricks chart” or “Mario chart,” or
“bridge” in finance) capture intermediate positive or negative
valuations of something—such as products or services, housing, or
stocks.
• The x-axis may be time, or it may be a variable.
• The y-axis is some sort of measure.
• In some charts, the starting and ending values are shown as full bars,
while the intermediate values float (as floating steps) to various
heights depending on their varying values.
• A waterfall chart may show valuation variance over time.
133
134. Waterfall Charts (cont.)
• This graph displays “the cumulative effect of sequentially introduced
positive or negative values” (“Waterfall chart,” Mar. 2017).
• There is a non-naïve assumption that what has occurred before may have
effects on the near-term on what follows (or is part of a larger affecting
trend).
• The depicted variables exist in a context and are in co-relationship.
134
138. Data Structure for the Waterfall Chart in the
Prior Slide
138
Dates Base Fall Rise
Total
Changes
4/3/2017 3.95 0 0 0
4/4/2017 3.95 0 0 0
4/5/2017 3.75 0.02 0 -0.2
4/6/2017 3.75 0 0 0
4/7/2017 3.75 0 0 0
4/10/2017 3.8 0 0.05 0.05
4/11/2017 3.85 0 0.05 0.05
4/12/2017 3.7 0.15 0 -0.15
4/13/2017 3.45 0 0.25 0.25
4/17/2017 3.5 0 0.05 0.05
4/18/2017 3.4 0.1 0 -0.1
4/19/2017 3.4 0 0 0
4/20/2017 3.4 0 0 0
4/21/2017 3.5 0 0.1 0.1
4/24/2017 3.5 0 0 0
4/25/2017 3.4 0.1 0 -0.1
This one was made with
the stacked vertical
column chart feature.
These are still not quite
presenting correctly, but
they’re close… The data
is from the Nasdaq
Historical Quotes tool.
139. Combo Chart
• Combination graphs are those which mix data and present the
findings in creative interlinked ways (optimally for new insights).
• Combining data requires finesse because there are ways to introduce
errors when mixing data. Data types may not align. Measures may
not be accurately matched. Some data may be redundant. Etc.
• There are many ways to create these.
• Some of the earlier charts may be “combination” ones as well because of the
integration of multiple variables and / or multiple datasets.
139
142. 3D Maps Geographical Imagery
• The 3D Maps imagery is related to locational mapping on a digital 3D
globe.
• There should be at least one to two columns of locational information
based on standard names for cities, states (or provinces), and
countries. Regional names are also recognized.
• The spellings of the names, though, should be standard to the tool.
• There may be other columns of related quantitative data related to
the respective locations. This may be time data, demographic data,
or various other relevant information.
142
143. 3D Maps Geographical Imagery (cont.)
• To set up data for 3D imagery, set up some locations: city,
state/province, country, and say, years of residence.
• Highlight the data.
• Go to Insert - > 3D Maps
• Adjust the fields for the look-and-feel.
• The maps are interactive (rotate-able), and zoomable.
143
145. Data Structure for the 3D Image in the Prior
Slide
City State Country Years of Residence
145
146. Some Tips for Creating Data Visualizations in
Excel 2016
• Do a mental walk-through of the underlying data.
• Consider what it is you want to communicate.
• Create a number of versions of the data visualizations. Experiment
broadly.
• Add data visualization details.
• Add surrounding information to ensure that the data visualization fits
the context.
146
147. Going “Off-Script” within Excel
Going with data visualization templates in Excel is a very fast way to portray
structured data.
However, there are some creative ways to re-visualize data in Excel by using existing
capabilities.
147
148. (1) A Composite Multi-Graph Image
• Let’s say that there is a need to create multiple graphs that are
interrelated and need to be exported as one file.
• Simply click on the outside borders of each of the elements, go to the
Page Layout tab, and click on Group. This will treat all the elements as
one group, and will enable clicking on just one part of the image to
“copy” the entire one into a photo editing tool.
• If the elements are not treated as one, then it will be difficult to export the
composite graphs as one with a screen grab (since a screen may not contain
the entire composite image).
• Piecemeal copy-and-paste exports will mean that the elements have to be
recomposed in a tool like Microsoft Visio, with the attendant challenges of
getting everything to align.
148
149. (2) Back-to-Back Bar Charts
• Begin with a set of relatively comparable data with the same variables
being compared (with a numerical measure).
• Assess the data with a shared measure.
• In Excel, create two separate horizontal bar charts.
• If the results are quite different, rework the horizontal axes to have
the same maximum number (so the two sides have a comparable
base).
• Add data labels for clarity of the bars.
149
150. (2) Back-to-Back Bar Charts (cont.)
• Create a name label for the data visualization using a text box.
• For one of the two horizontal bar charts, in the “Format Axis,” reverse
the order of the values.
• For the one with reversed values, delete the vertical axis with the
numbers.
• Create a text box with the variables centered.
• Strive to align the two bar charts. (This is easier said than done
because the horizontal bars are not the same thickness necessarily if
the numbers are quite different.)
150
151. (2) Back-to-Back Bar Charts (cont.)
• Add a white background to the image, so that the Excel cells do not
show up.
• If further cleanup work is needed, drop the image into Photoshop or
another image editing tool, and clean up the image before placing the
image.
• Once an Excel graph is made into an image, it is no longer machine
readable and not screen-readable, so informationally-equivalent alt-
text should be included to ride along with the image.
151
152. (2) A Rough Example of a Back-to-Back Bar
Chart
152
153. (3) A Stacked Pyramid Chart
• Create a list of frequency data.
• Highlight the frequency data, and filter from largest to smallest. Be
sure to extend the selection, so the data labels move with the correct
frequency amounts.
• Intersperse lines between each row, and put in a placeholder amount
(say, 100 for the amount).
• Highlight the data, and insert a 3D 100% stacked column chart.
• Highlight the data columns and right-click. In the Format Data Series
window, select “Full Pyramid.”
153
154. (3) A Stacked Pyramid Chart (cont.)
• With the chart highlighted, go to the Design tab, and click “Switch
Row/Column.” The separate columns will coalesce into one pyramid.
• Click on the left axis (100% to 0%), and select “Format Axis.” In the
“Format Axis” window at the right, select “Values in reverse order.”
• In the chart area, select the visual elements which are not desired and
click delete to remove any visual objects that are not desired.
• Click the “plus” at the right of the chart and add elements that are desired
(such as a Legend).
• Adjust the size of the separators from 100 to another consistent number to
create the sense of space between the reverse pyramid elements.
154
155. (3) A Stacked Pyramid Chart (cont.)
• Right-click one of the placeholder layers in the visualization, and go to
the Format Data Series window. In the “Fill” tab, select “No fill.” Do
this for each of the placeholder layers to give a sense of physical
distance between each of the actual data layers.
• The “Enrollment Summary by College” data in the following table
comes from the Office of the Registrar at Kansas State University, at
http://www.k-state.edu/registrar/statistics/colleges.html. This is
from 2016.
• This data visualization type may align with sequential or pipeline data
as well as others.
155
158. Some Common Mistakes
• Not ensuring that the underlying data behind a data visualization is
correct
• A lack of alignment and fit between the underlying data and the data
visualization form
• Going with a data visualization only because the software seems to enable
it…but not working through the visualization to make sure that it makes sense
both visually and data-wise
• Confusing rates with actual measures, and others
• Combining non-comparable data types
• Having data in a cell which is not identified by accurate type (such as
“date” information as “general” data or “number” information as
“text” data)
158
159. Some Common Mistakes(cont.)
• An incoherent data visualization enabling a wide variety of
misinterpretations (or conflicting data in a data visualization)
• Insufficient data visualization context
• Poor labeling of data: insufficient labels, inaccurate labels, non-
neutral language, illegibility, and / or others
• Not spell checking data visualizations
• Not studying the conventions of the data visualization
• Assuming that viewers have the same level of background knowledge
as the creator of the data visualization
159
160. Some Common Mistakes(cont.)
• Excess data in the data visualization (such as extra decimal places for
whole numbers for a lot of .00)
• A 2D or 3D data visualization with excessive data and data element
occlusion
• A 4D data visualization with pacing that is too fast or too slow (or
which does not enable viewer pacing or control)
160
161. One Main Realization
• The work to conduct the research and to acquire the actual data takes
about 95% of the effort and time, and drawing the data visualization
takes about 5% of the effort…but the data visualization piece is also
critical (because a lot can be compromised with improper drawing of
the data visualization).
161
162. Adding Relevant Data
Visualization Elements
Data visualizations should be as simple as possible, with no extraneous elements
that do not contribute to the overall meaning of the chart.
162
163. Common Data Visualization Elements
• A clear noun-phrase title
• Labels for the x- and y-axes (and sometimes y1 and y2
axes)
• Data labels
• Gridlines
• A data table (for some data visualizations)
• A legend
• Error bars
• Trendlines, and others
163
164. Graph Styles
• Various style versions of the target graph
• Background styles
• Object handling
• Texturing of objects and shapes
• Font types and styles
• 2D vs. 3D, and others
164
165. Range of Color Palettes
• Ability to add a variety of colors in
palettes that are aesthetically
pleasing and of sufficient contrast
for visual accessibility
• Color palettes may be selected by
dominant colors
165
166. To Change Graph Colors…
• To change the colors of the plot, highlight the
plot.
• In the Design tab of the ribbon, select “Change
Colors.” A dropdown menu will enable the
selection from a variety of color palettes.
The palettes are divided into two sections:
colorful (polychromatic and contrastive) and
monochromatic (different shades of a
particular color, often in gradients).
166
167. To Select Custom Colors…
• Custom colors may be applied to particular elements. Just right click
the element, and change the fill color.
167
169. Dropdown Menus with Additional Options
• Users have a high level of
control for the look, feel, and
function of the chart / graph
elements.
169
170. MS Excel’s Page Layout Features
• Excel has a variety of layout features that may enable in-graph
editing.
• Some of the features of this tab include the following:
• Pre-built themes
• Backgrounds
• Scaling and sizing
• Gridlines
• Arrangements (bringing forward, sending back
• Auto alignment choices
• Grouping, and others
170
173. Several Main Ways to Export Excel Charts
Copy and Paste as a Linked Graph
• Can export data visualizations as a
copy and paste (which will maintain
the link to the original file—as long as
all the respective files’ locations are
not changed)
• Copied and pasted charts will
maintain an alpha channel behind
visual elements (so there is an
invisible layer with 100%
transparency)
• Colors of the data graphs will change
in PowerPoint based on the applied
design styles and color palettes
Save as Template
• Can export data templates for
use later on
173
175. Several Main Ways to Export Excel Charts (cont.)
Copy and Paste as an Image into a
Digital Image Editing Software Program
• Can copy the graph by clicking on
its outer edges, doing CTRL + C (to
save the image to a Windows
machine clipboard), and pasting
into Adobe Photoshop…and
changing the resolution, contrast,
and aspect ratio as needed…and
then exporting out / saving the
image as a .png, .tif, .jpg, .gif, or
some other
Copy and Paste as an Image into a
Diagramming / Drawing Software Program
• Can copy the graph as an image
into a diagramming / drawing
software program (like Microsoft
Visio) and adding image overlays
before outputting in the
proprietary file format and then
as a digital image
175
176. Microsoft Visio
• For example, MS Visio offers the following: pre-made templates,
forms, containers, call-outs, connectors, and others
• There are overlays of shapes, text boxes, lines, fonts, and others
• Shapes may be highlighted and operations may be applied to them: union,
combine, fragment, intersect, subtract, join, trim, and others…through an
activate-able Developer tab
• To offer more control, users have gridlines, drag-able guidelines,
automated positioning and alignment, grouping features, aspect-ratio
controls, and others
• Color-based themes and variants
176
184. What are Add-ins?
• Add-ins are software programs built to function with Excel to add
various types of functionalities: data analytics, data visualizations, QR
code generation, expanded export file types, and others
• Add-ins / add-ons are helpful because they add functionality to a
software that is already somewhat familiar
184
185. Where Can One Find Add-ins for Excel?
• Some of the Excel add-ins are from Microsoft Research and may be
activated within the tool.
• Some are available from download from the Office Store (such as a free
Streamgraph drawing add-on that creates area charts that vary over time).
• Others are related to software programs (like Acrobat PDF) and
enable richer ways to share / interchange file types.
• Some are downloadable from CodePlex and GitHub (like Network
Overview, Discovery and Exploration for Excel or “NodeXL”), for social
media platform data extraction, network analysis, network graph
drawing, and other capabilities.
185
186. Where Can One Find Add-ins for Excel? (cont.)
• There are different directions for accessing different types of add-ins.
• Some will require mere activation.
• Those that are built into the tool will require mere activation, if that.
• Those that come with other software programs will require mere activation, if that.
• Some will require a download and some installation.
• Some will require a download, but these may auto-installation.
186
187. Activating Add-ins
• In Excel, click the File tab.
• Click Options. The Excel Options window opens.
• Click “Add-ins” in the left menu.
• A list of available add-ins will display in the window, in several
categories:
• Active Application Add-ins
• Inactive Application Add-ins
• Document Related Add-ins
• Disabled Application Add-ins
187
188. Activating Add-ins(cont.)
• Select an add-in of interest, and click “Go” at the bottom.
• An “Add-in” window will open allowing the user to check certain
boxes to activate or to uncheck boxes to de-activate.
• Click “OK” once the selections are decided.
• These are global settings, and the add-ins should be good for future
uses Excel.
188
192. About Data
• Data…
• Has to be collected somewhere advertently or inadvertently
• Has to be practically applied in some way (strategic, tactical, other)
• May be pre-labeled or post-labeled
• Structured data datasets include the following:
• What a thing is (data type) and generally how it relates to everything else
• Dataset metadata include the following:
• How the data was collected (hopefully with high standards and finesse)
• When the data was collected
• Who collected the metadata and how they should be cited
192
193. About Data (cont.)
• Dataset metadata may be captured in data dictionaries if the dataset
is a larger sized one
• The fact that data is in the same set means there is some relatedness
whether you can see it or not (or you may have brought unrelated
contents into a dataset and are seeing relations that may not exist)
• Handling data requires finesse:
• Data handling should be back-stopped by protected raw datasets which are
left pristine and unprocessed (so researchers can always grab another set to
process differently)
• How you clean and handle it matters (handling can introduce artifacts,
mistakes, and skews)
• Researchers can’t afford to be sloppy or unthinking
193
194. About Data (cont.)
• Having access to a data table or a dataset can give the deceptive
sense of understanding
• Data has to be understood from a deep background in the subject matter
• Data has to be understood in the context of larger sets of data that may be
cross-referenced expertly
• Fragmentary data reveals in some cases and obfuscates in others
194
196. Some Common Standards for Data
Visualizations
• Data accuracy (underlying data;
proper contextualization; source
citations; disambiguation;
correction of errors; non-
manipulation of data consumers;
differentiation between
empirical, conceptual, and
synthetic data)
• Intellectual property protections
(copyright)
• Privacy protections (protection
against re-identification of de-
identified data)
• Proper crediting of all sources
• Accessibility through file
versioning, alt texting, access to
underlying databases, and
captioning
196
197. Some Common Standards for Data
Visualizations (cont.)
• Human and machine readability
of data tables
• Contextualizing
197
198. Contact and Conclusion
• Dr. Shalin Hai-Jew
• iTAC
• Kansas State University
• 785-532-5262
• shalin@k-state.edu
• Note:
• The data sources have generally been cited close to the data visualization.
• The presenter has no relationship to any of the software makers.
198