This document describes how rainfall estimates are created from raw radar data at the National Centers for Environmental Information (NCEI). It explains that multiple radars are combined and merged to get nationwide coverage with 2 minute resolution on a standardized grid. Raw radar returns are cleaned of contamination from birds and other sources. The vertical radar scans are integrated into a single rainfall estimate. The final product provides a consistent, high quality rainfall estimate compared to the original radar data which had poorer spatial and temporal resolution and quality issues.
Selaginella: features, morphology ,anatomy and reproduction.
CLIM Undergraduate Workshop: How was this Made?: Making Dirty Data into Something Usable at NCEI - Stevens & Rennie, Oct 24, 2017
1. Click to edit Master title style
Click to edit Master subtitle style
1
How were these made?
Turning dirty data into something usable at NCEI
Jared Rennie and Scott Stevens
SAMSI Undergraduate Workshop
October 24th, 2017
3. 3
What We’re Taught
Name How Tall are You (inches)?
Adam 66.2
Barbara 63.4
Clint 69.5
Debra 65.3
Easton 71.5
Frances 68.5
George 64.2
Hilda 61.2
Idris 74.5
4. 4
What We Get
Name How Tall are You (inches)?
Adam 66 inches
Barbara 63.4287
Clint 176.5
Debra
Easton 41.5
Frances 68.5
George 5’4”
Hilda 61.2
Idris 210 lbs
5. 5
What We Get
Name How Tall are You (inches)?
Adam 66 inches Added on some extra units
Barbara 63.4287 Has too much confidence in her
ruler’s precision
Clint 176.5 Gave it in cm instead of inches
Debra Just didn’t answer
Easton 41.5 Made a typo
Frances 68.5
George 5’4” Ignored the instructions
Hilda 61.2
Idris 210 lbs Answered a different question
entirely
10. 10
TL;DR
• A lot of work goes into our data before most
scientists ever even see it
• Most of the “controversy” regarding climate
change isn’t around the analysis, but the data
• What we start with is messy and of little
value to most scientists, so we have to make
it more user-friendly
– And we have to carefully document how
14. 14
National Centers for Environmental information
• Located in downtown Asheville, NC.
• Holds over 30 Petabytes of weather and
climate data from all over the world.
15. 15
Cooperative Institute for Climate and Satellites
• Research institution working with NCEI.
• Made up of meteorologists, software
developers, and science communicators.
• Vision
– Inspire cutting-edge research and collaboration.
– Advance understanding of the current and
future state of the climate.
– Engage with business, academia, government,
and the public to enhance decision making.
19. 19
That’s great… but
• These are the final products.
• A lot of hard work goes into generating these
“pretty images.”
– Math, statistics, meteorology, physics, orbital
mechanics, instrumentation, engineering, computer
programming, visualization, communication.
• And most importantly…
23. 23
Surface Temperature: The Beginnings
• The instrumental record of temperature
has its roots in the development of
universal temperature scales in the 18th
century.
– Monthly mean temperature series for De
Bilt, Netherlands extends from 1706 to
the present.
– Several other long European series exist
going back over 200 years.
• Throughout the 1800s measurements
expanded across other continents.
– National Meteorological and Hydrological
Services (NMHS) around the world have
operated networks to support weather
and climate observations since the late
19th
Century.
24. 24
Surface Temperature: The Beginnings
• In the 1980’s and 1990’s, major efforts were
made to collect observations and create
consolidated global datasets.
– Global Historical Climatology Network – Monthly
(GHCN-M)
• Focus shifted to daily data in the 2000’s
– Global Historical Climatology Network – Daily
(GHCN-D)
25. 25
Surface Temperature: The Beginnings
• GHCN-M
– In-Situ data for 7,280 stations in 226 countries.
– Monthly maximum, minimum, and average
temperature.
– Automated ingest, qc, and corrected for bias.
• GHCN-D
– Integrated database of daily climate summaries.
– TMAX, TMIN, PRCP, SNOW, SNWD, others.
– 100,000 stations globally (~30,000 temperature).
26. 26
Surface Temperature: The Beginnings
• GHCN, along with other datasets, are the
foundation for understanding trends in global
surface temperature.
Land Land + Ocean
NCEI GHCN-M /
GHCN-D
NOAAGlobalTemp
UK Met
Office
CRUTEM HADCRUT
NASA GISS GISS
28. 28
How did this get made?
1. Collect weather stations from around the globe.
2. Merge similar stations together.
3. Run through quality control.
4. Perform homogenization.
5. Grid station data.
6. Create departure from average.
29. 29
How did this get made?
1. Collect weather stations from around the globe.
2. Merge similar stations together.
3. Run through quality control.
4. Perform homogenization.
5. Grid station data.
6. Create departure from average.
33. 33
How did this get made?
1. Collect weather stations from around the globe.
2. Merge similar stations together.
3. Run through quality control.
4. Perform homogenization.
5. Grid station data.
6. Create departure from average.
34. 34
Merge Program
• Iterative process between the first source, and all
candidate sources.
• Compare each candidate station to all the target
stations, and determine one of the following three
possibilities:
– Candidate station should merge with target station.
– Candidate station is unique, and should be added
as a new station in the target source.
– Not enough information, station is withheld.
Reference: Rennie et al. (2014)
35. 35
Merge Program
• Metadata Tests
– Probability of station match is calculated based
upon three criteria:
• Geographic distance
• Elevation difference
• Station name similarity using Jaccard Index
– Intersection divided by Union
– Final weighted probability is calculated, and if it
passes this test, moves on to second test, otherwise
it is withheld.
Reference: Rennie et al. (2014)
36. 36
Merge Program
• Data Tests
– Compare overlapping data using Index of
Agreement.
– From IA, calculate probability of station match and
probability of station uniqueness.
– Used to determine final fate of station.
Reference: Rennie et al. (2014)
39. 39
How did this get made?
1. Collect weather stations from around the globe.
2. Merge similar stations together.
3. Run through quality control.
4. Perform homogenization.
5. Grid station data.
6. Create departure from average.
40. 40
Quality Control Checks
Flag Name Description
E Inter-station Duplicate Check
Identify different stations that have same 12 monthly
values for a given year.
I Internal Consistency Check Looks for cases where TMIN > TMAX.
L Isolated Value Check Value has 18 months of missing data on either side.
O Climatological Outlier Check Monthly value is >= 5 standard deviations from mean.
S Spatial Consistency Check
Any monthly value between 2.5 and 5 standard
deviations are compared against closest neighbors within
500 km. Z-scores are used to determine if failed or not.
W Month-over-month Check
Data ingested from non-US countries via satellite are
flagged if data is duplicated from previous months.
Reference: Lawrimore et al. (2011)
41. 41
How did this get made?
1. Collect weather stations from around the globe.
2. Merge similar stations together.
3. Run through quality control.
4. Perform homogenization.
5. Grid station data.
6. Create departure from average.
42. 42
Homogenization
• Looks for non-climatic shifts in temperature.
• Examples include
– Station moves
– Instrument changes
– Errors in observation
• Applied to monthly data using the Pairwise
Homogeneity Algorithm (PHA).
• Daily homogenization exists, but in limited
fashion.
43. 43
Homogenization
• Pairwise Homogeneity Algorithm (PHA)
– Identifies 100 nearest neighbors, calculates
anomalies, and differences are made.
– Statistically significant shifts in mean are identified
as change points.
• Uses the Standard Normal Homogeneity Test (SNHT).
Reference: Menne and Williams (2009)
46. 46
Homogenization
• Homogenization Example: Central Park
“As is clearly evident, adjustments made the dust bowl period cooler, while post 1995 had no adjustments applied. This
results in a temperature trend that is steeper because the past is cooler than the present. The only problem is that it isn’t
what the data actually recorded then.
I think maybe we need to coin a new term for NOAA NCDC – ‘dust bowl deniers’. Yes it appears there is man made
warming underway but the men are in Asheville, North Carolina at NOAA’s National Climatic Data Center.”
51. 51
Homogenization
• Metadata is important to help identify items
such as station moves, instrumentation
changes, etc.
• Sometimes unknown.
• Historical Observing Metadata Repository.
– HOMR
– Equipment was updated in 1995 with better
calibrated temperature sensor.
1995-06-27 Update: CHANGE NWS MANAGER
1995-06-27 Update: 10. CHG EQUIP, OBSVR, ADDR, PHONE, LAT/LON
52.
53. 53
How did this get made?
1. Collect weather stations from around the globe.
2. Merge similar stations together.
3. Run through quality control.
4. Perform homogenization.
5. Grid station data.
6. Create departure from average.
55. 55
Gridding Method
• Thin-Plate Smoothing Spline Method.
• Uses Australian National University Splines
(ANUSPLIN) package.
Variable
(Temperature)
lat/lon Smoothing
Parameter
Penalty
Function
Reference: Vose et al. (2014)
56. 56
How did this get made?
1. Collect weather stations from around the globe.
2. Merge similar stations together.
3. Run through quality control.
4. Perform homogenization.
5. Grid station data.
6. Create departure from average.
57. 57
What is an Anomaly
• Departure from a reference value.
– Normalizes the data.
• Diagnostic tool that provides a big picture
overview of temperatures compared to a
reference value.
• Removes regional influences such as
mountains, deserts.
64. 64
How much did it rain here?
● How do we get from
this to a rainfall
estimate?
● What potential errors
or roadblocks can you
see?
65. 65
Radar Sees Everything
Resolution poor at higher
distances from radar
Probably migrating geese
Actually closer to
Wilmington radar. They
probably disagree.
75. 75
Before and After
Before After
Spatial Extent 250 km for each radar Nationwide
Time Resolution
About six minutes for each radar,
not synced with each other at all
2 minutes
Coordinates and Resolution Polar; Widens with distance Cartesian Lat/Lon Grid
Vertical Resolution ~ 19 vertical tilts Single integrated estimate
Variable Available Radar Reflectivity (dbZ) Rainrate (mm/hr)
Data Quality
Contaminated with birds and
windfarms and stuff
Not contaminated with birds and
windfarms and stuff