6 measurement&representation

Measurement and Representation of
Hydrological Quantities
Leonardo da Vinci - Vitruvian Man, ca 1487
photo by Luc Viatour, www.lucnix.be

Riccardo Rigon
Sunday, September 12, 2010

Measurement and Representation of Hydrological Quantities

Objectives:

•In these pages the spatio-temporal variability of measurements of
hydrological quantities is discussed by means of examples.

•One deduces that statistical instruments must be used to describe these
quantities.

2

Riccardo Rigon



Frickenhausen, on the River Meno
Hydrometric Height

3

Riccardo Rigon



Frickenhausen, on the River Meno
Hydrometric Height

4

Riccardo Rigon



Hydrological Data have Complex Trends 1/2
The hydrological cycles is controlled by innumerable factors: hence it depends
on innumerable degrees of freedom. Only a small portion of these factors can be
taken into consideration, while the remaining part needs to be modelled as a
boundary condition or as “background noise” (this noise is either modelled or
eliminated with statistical instruments).

The dynamics of the hydrological cycle are non-linear. Both the hydrodynamics
and the thermodynamics of the processes, that involve numerous phase
changes, are non-linear. Another non-linear characteristic is that many of these
processes are activated in function of some regulating quantity surpassing a
threshold value. For example, the condensation of water vapour into raindrops
is triggered when air humidity exceeds saturation; landslides are triggered when
the internal friction forces of the material are overcome by the thrust of water
within the capillarities of the soil; the channels of a hydrographic network begin
to form when running water reaches a certain value of force per unit area.
5

Riccardo Rigon



Hydrological Data have Complex Trends 2/2

The dynamics include processes which are linearly unstable: for example the
baroclinic instability the drives meteorological processes at the middle
latitudes.

The dynamics of climate and hydrology are dissipative. That is to say they
transfer and transform mechanical energy into thermal energy. The
hydrodynamic process of turbulence transports energy from the larger
spatial scales to the smaller ones, where the energy is dissipated through
friction. Wave phenomena of various kind (e.g. gravity waves) transport the
energy contained in water and in air.

6

Riccardo Rigon



Some Typical Problems
precipitation

7

Riccardo Rigon



incident solar radiation

8

Riccardo Rigon



Flow of the River Adige at San Lorenzo Bridge
1400
1200
1000
Portate m^3/s

800
600
400
200
0

1990 1995 2000 2005

Anno 9

Riccardo Rigon



Distribution of monthly river flows in Trento

10

Riccardo Rigon



Annual water budget for the Lake of Serraia catchment
Grafico bilancio annuo del bacino (2000)
P - precipitazione ET - evapotraspirazione Inv - volume invasato (accumulo) R - rilascio
1

0,9 0,8675
0,797
0,8

0,7

0,6

0,5
Valore (mc/s)

0,4
0,343

0,3

0,2

0,1

0

-0,1

-0,2 -0,184

-0,3
gen-00 feb-00 mar-00 apr-00 mag-00 giu-00 lug-00 ago-00 set-00 ott-00 nov-00 dic-00
Tempo (mese- anno)

11

Riccardo Rigon



Water content of the soil in the Little Washita catchment (Oklahoma)

12

Riccardo Rigon



Water content of the soil in the Little Washita catchment (Oklahoma)

13

Riccardo Rigon



Spatial distribution of preceipitation

14

Riccardo Rigon



Spatial pattern of the hydrographic network

15

Riccardo Rigon


Statistical Inference
and Descriptive Statistics

Lucio Fontana - Expectations (MoMA), 1959

Riccardo Rigon



Objectives:

•In these pages the fundamental elements of statistical analysis will be
recalled.

•Population, sample and various elementary statistics, such as mean,
variance and covariance, will be defined.

•The existence of statistics and their value will be argued.

•The concept of random sampling will be introduced.

17

Riccardo Rigon


Statistics

Population and Sample

Statistical inference assumes that a dataset is representative of a subset of
cases, among all the possible cases, called the sample. All the possible
cases represent the population from which the dataset has been extracted.
While the sample is know, generally the population is not. Hypotheses are
implicitly made about the population.

18

Riccardo Rigon


Statistics

Exploratory Data Analysis
temporal representation - histogram

A set of n data constitutes, therefore, a sample of data.
a) Bergen:Sep temperature
15

14

Temperature (oC)
13

12

11

10

9

8
1860 1880 1900 1920 1940 1960 1980 2000
time

b) Bergen:Sep temperature distribution (1861!1997)
30

25

20
Frequency

15

10

5

0
5 6 7 8 9 10 11 12 13 14 15
Temperature (oC)

These data can be represented in various forms. Each representation
form emphasises certain characteristics. 19

Riccardo Rigon


Statistics

Sample Means

Given a sample, various statistics can be calculated. For example:

n
1
x :=
¯ x,t Temporal Mean
n t=1

n
1
x := xi Spatial Mean
n i=1

The mean is an indicator of position

20

Riccardo Rigon


Statistical Inference and Descriptive Statistics

Corrado Caudek

21

Riccardo Rigon




•Statistical inference is the process which allows one to formulate
conclusions with regards to a population on the basis of a sample of
observations extracted casually from the population.
Corrado Caudek

21

Riccardo Rigon





•Central to classic statistical inference is the notion of sample distribution,
that is to say how the statistics of the samples vary if casual samples, of the
same size n, are repeatedly extracted from the population.
Corrado Caudek

21

Riccardo Rigon





•Central to classic statistical inference is the notion of sample distribution,
that is to say how the statistics of the samples vary if casual samples, of the
same size n, are repeatedly extracted from the population.

•Even though, in each practical application of statistical inference, the
researcher only has one n-sized casual sample, the possibility that the
sampling can be repeated furnishes the conceptual foundation for deciding
Corrado Caudek

how informative the observed sample is of the population in its entirety.

21

Riccardo Rigon


Statistics

Exploratory Data Analysis
The mean is not the only indicator of position

Mode

22

Riccardo Rigon


Statistics

Median and Mode

The mode represents the most frequent value.

If the histogram distinctly presents various maximums, though the matter
risks being controverial, the dataset is said to be multimodal.

The median represents the value for which 50% of the data has an inferior
value and (obviously!) the other 50% has a greater value.

23

Riccardo Rigon


Statistics

Empirical Distribution Function

Given the dataset

hi = {h1 , · · ·, hn }

and having derived from this the ordered set in ascending order

ˆ ˆ ˆ ˆ ˆ ˆ
hj = (h1 , · · ·, hn ) h1 ≤ h2 ≤ · ≤ hn

the empirical cumulative distribution function is defined

i
ˆ 1
ECDFi (h) := j
n j=1
24

Riccardo Rigon


Statistics

ECDF
The empirical cumulative distribution function can be represented as illustrated.
The ordinate value identified by the curve is called the frequency of non-
exceedance or quantile.
Frequenza di non superamento

1.0 ●
●
●
●
●
●
●
●
●
●
0.8

●
●
●
●
●
●
●
●
●
0.6

●
●
P[Hh]

●
●
●
●
●
●
●
0.4

●
●
●
●
●
●
●
●
0.2

●
●
●
●
●
●
●
●
●
0.0

20 40 60 80
25
h[mm]

Riccardo Rigon


Statistics

ECDF
The 0.5 quantile separates the data distribution in half in relation to the ordinate.


1.0 ●
●
●
●
●
●
●
●
●
●
0.8

●
●
●
●
●
●
●
●
●
0.6

●
●
P[Hh]

●

0.5 quantile ●
●
●
●

●
●
0.4

●
●
●
●
●
●
●
●
0.2

●
●
●
●
●
●
●
●
●
0.0

20 40 60 80
26
h[mm]

Riccardo Rigon


Statistics

ECDF
The 0.5 quantile separates the data distribution in half in relation to the ordinate.


1.0 ●
●
●
●
●
●
●
●
●
●
0.8

●
●
●
●
●
●
●
●
●
0.6

●
●
P[Hh]

●

0.5 quantile ●
●
●
●

●
●
0.4

●
●
●
●
●
●
●
●
0.2

●
●
●
●
●
●
●
●
●
0.0

20 40 60 80
27
h[mm]

Riccardo Rigon


Statistics

ECDF

And so the median is identified


1.0 ●
●
●
●
●
●
●
●
●
●
0.8

●
●
●
●
●
●
●
●
●
0.6

●
●
P[Hh]

●

0.5 quantile ●
●
●
●

●
●
0.4

●
●
●
●
●
●
●
●
0.2

●
●
●
●
●
●
●
●
●
0.0

median 20 40 60 80
28
h[mm]

Riccardo Rigon


Statistics

Box and Whisker Diagrams

The procedure can be generalised and represented with a box and whisker diagram.

1.0
●
●
●
●
●
●
●
●
●
●

0.8
●

0.75 quantile
●
●
●
●
●
●
●
●

0.6
●
●

0.5 quantile P[Hh]
●
●
●
●
●
●
●

0.4 ●
●

0.25 quantile
●
●
●
●
●
●
0.2

●
●
●
●
●
●
●
●
●
0.0

20 40 60 80

h[mm]

“whisker”

29
The box and whisker diagram is another way of representing the data distribution.

Riccardo Rigon


Statistics

Parameters and Statistics

A parameter is a describes a certain aspect of the population.

• For example, the (real) mean annual precipitation at a weather station
is a parameter. Let us suppose that this mean is

µh = 980 mm

• In any concrete situation the parameters are unknown
Corrado Caudek

30

Riccardo Rigon


Statistics

Parameters and Statistics

A statistic is a number that can be calculated on the basis of data
given by a sample, without any knowledge of the parameters of the
population.

• Let us suppose, for example, that the casual sample of precipitation
data covers 30 years of measurement and that the mean annual
precipitation, on the basis of the sample, is

¯
h = 1002 mm
Corrado Caudek

• This mean is a statistic.
31

Riccardo Rigon


Statistics

Other Statistics: the Range

Rx := max(x) − min(x)

The range is the simplest indicator of data distribution. It is an indicator of the
scale of the data. However, it only considers two data and does not consider
the other n-2 data that make up the sample.

32

Riccardo Rigon


Statistics

Other Statistics: Variance and
Standard Deviation

n
1
V ar(x) := (xi − x)
¯
n i=1

n
1
σx := (xi − x)
¯
n i=1

The variance is an indicator of “scale” that considers all the data of the sample
33

Riccardo Rigon


Statistics

Other Statistics: Variance and
Standard Deviation
“corrected” version (unbiased)

n

1
V ar(x) := (xi − x)
¯
n−1 i=2

n
1
σx := (xi − x)
¯
n−1 i=1

The unbiased version of the variance takes into account that only n-1 data are
independent, their mean being fixed. 34

Riccardo Rigon


Statistics

Coefficient of Variation

• The coefficient of variation (CV) of a data sample is defined as the
ratio of between the standard deviation and the mean:

σx
CVx :=
x¯
• The greater the coefficient of variation, the less informative and
indicative the mean is in relation to the future trends of the
population.

35

Riccardo Rigon


Statistics

Other Statistics: Skewness and Kurtosis

n
3
1 ¯
xi − x
skx :=
i=1
n σx

Skewness is a measure of the asymmetry of the data distribution

n
4
1 ¯
xi − x
kx := 3 +
i=1
n σx

Kurtosis is a measure of the “peakedness” of the data distribution
36

Riccardo Rigon


Statistics

Estimation and Hypothesis Testing

Usually, we are not interested in the statistics for themselves, but in
what the statistics tell us about the population of interest.
• We could, for example, use the annual mean precipitation, measured
at all hydro-meteorological stations, to estimate the mean annual
precipitation for the Italian Peninsula.
• Or, we could use the mean of the sample to establish whether the
mean annual precipitation has mutated during the duration of the
sample.

37

Riccardo Rigon


Statistics


These two questions belong to the two main schools of classical
statistical inference

• The estimation of parameters

• Statistical hypothesis testing

38

Riccardo Rigon


Statistics

Sample Variability
A fundamental aspect of sample statistics is that they vary from one
sample to the next. In the case of annual precipitation, it is very
improbable that the mean of the sample, of 1002mm, will coincide
with the mean of the population.
• The variability of a sample statistic from sample to sample is called
sample variability.
– When sample variability is very high, the sample is
misinformative in relation to the population parameter.
– When the sample variability is small, the statistic is informative,
even though it is practically impossible that the statistic of a
sample be exactly the same as the population parameter.

39

Riccardo Rigon



Sample Variability
Simulation
Sample variability will be illustrated as follows:
1. we will consider a discrete variable that can only assume a small
number of possible values (N = 4);
2. a list will be furnished listing all possible samples of size n = 2;
3. the mean will be calculated for each possible sample of size n = 2;
4. the distribution of means of the samples of size n = 2 will be
examined.
The mean μ and the variance σ of the population will be calculated. It
must be noted that μ and σ are parameters, while the mean xi and the
variance s2i of each sample are statistics.
Corrado Caudek

Techniques in Psychological Research and Data Analysis 8

40

Riccardo Rigon



Sample Variability

•The experiment in this example consists of the n=2 extractions with
return of a marble xi from an urn that contains N=4 marbles.

•The marbles are numbered as follows: {2, 3, 5, 9}

•Extraction with return of the marble corresponds to a population of
infinite size (it is in fact always possible to extract a ball from the urn)
Corrado Caudek

41

Riccardo Rigon



Sample Variability

•For each sample of size n=2 the mean of the value of the marbles
extracted is calculated:
2
xi
x=
¯
i=1
2
•For example, if the marbles extracted are x1=2 and x2=3, then:

2+3 5
x=
¯ = = 2.5
Corrado Caudek

2 2

42

Riccardo Rigon



Sample Variability
Three Distributions
We must distinguish between three distributions:

1. the population distribution

2. the distribution of a sample

3. the sample distribution of the means of all possible samples
Corrado Caudek

43

Riccardo Rigon



Sample Variability
๏ 1. The Population Distribution

The population distribution: the distribution of X (the value of the
marble extracted) in the population. In this specific case the population
is of infinite size and has the following probability distribution:

xi pi
2 1/4
3 1/4
5 1/4
9 1/4
Corrado Caudek

Total 1

44

Riccardo Rigon



Sample Variability

•The mean of the population is:

µ= xi pi = 4.75

•The variance of the population is:

σ =2
(xi − µ) pi = 7.1875
2
Corrado Caudek

45

Riccardo Rigon



Sample Variability
๏ 2. The Distribution of a Sample

The distribution of a sample: the distribution of X in a specific sample.

• If, for example, the x1 = 2 and x2 = 3, then the mean of this
sample is x = 2.5 and the variance is s2 = 0.5
¯
Corrado Caudek

46

Riccardo Rigon



Sample Variability

๏ 3. The Sample Distribution of a the Means
The sample distribution of a the means: the distribution of the means
of all the possible samples.

• If the size of the samples is n=2, then there are 4X4=16 possible
samples. We can therefore list their means.
sample mean xi
¯ sample mean xi
¯
{3, 2} 2.5 {2, 3} 2.5
{5, 2} 3.5 {2, 5} 3.5
{9, 2} 5.5 {2, 9} 5.5
{5, 3} 4.0 {3, 5} 4.0
Corrado Caudek

{9, 3} 6.0 {3, 9} 6.0
{9, 5} 7.0 {9, 5} 7.0
{2, 2} 2.0 {3, 3} 3.0
{5, 5} 5.0 {9, 9} 9.0 47

Riccardo Rigon



Sample Variability

•The sample distribution of the means has the following probability
distribution:
¯
xi pi
2.0 1/16
2.5 2/16
3.0 1/16
3.5 2/16
4.0 2/16
5.0 1/16
5.5 2/16
Corrado Caudek

6.0 2/16
7.0 2/16
9.0 1/16
Total 1
48

Riccardo Rigon



Sample Variability

•The mean of the sample distribution of the means is:

µx =
¯ xi pi = 4.75
¯

•The variance of the population is:

2
σx
¯ = (¯i − µx ) pi = 3.59375
x ¯
2
Corrado Caudek

49

Riccardo Rigon



Sample Variability
! The example we have seen is very particular insomuch that the
population is known. In practice the population distribution is never
known.
However, we can take note of two important properties of the sample
distribution of the means:

•The mean of the sample distribution of means µx is the same as the
¯
population mean µ
2
•The variance of the sample distribution of means ¯σx is the equal to
2
the ratio of the variance of the population σ to the numerosity n of
Corrado Caudek

the sample:
σ2 7.1875
σx =
2
= = 3.59375
¯
n 2 50

Riccardo Rigon



Sample Variability
The two things to note can be summarised as follows:

•The mean and variance of the sample distribution of means are
determined by the mean and variance of the population:

σ2
µx = µ
¯ σx =
2
¯
n

•The variance of the sample distribution of the means is smaller than
the variance of the population.
Corrado Caudek

51

Riccardo Rigon



Sample Variability
To follow, we will use the properties of the sample distribution to
make inferences about the parameters of the population even when
the population distribution is not known.
Corrado Caudek

52

Riccardo Rigon



Sample Variability
Three Distributions
Therefore, we have distinguished between three distributions:

1. the population distribution

Ω = {2, 3, 5, 9}, µ = 4.75, σ 2 = 7.1875
2. the distribution of a sample

Ωi = {2, 3}, x = 2.5, s = 0.5
¯ 2

3. the sample distribution of the means of all possible samples
Corrado Caudek

Ωx = {2.0, 2.5, 3.0, 3.5, 4.0, 5.0, 5.5, 6.0, 7.0, 9.0},
¯
µx =
¯ 4.75, σx
2
¯ = 3.59375
53

Riccardo Rigon



Sample Variability
The population distribution: this is the distribution that contains all
possible observations. The mean and variance of this distribution are
indicated with μ and σ2.

1. The distribution of a sample: this is the distribution of the values of
the population that make up a particular casual sample of size n. The
single values are indicated x1,.... xn, and the mean and variance are
¯
indicated x and s2.

2. The sample distribution of the means of the samples: this is the
¯
distribution of the xi for al the possible samples of size n that can be
extracted from the population being considered. The mean and variance
of the sample distribution of means are indicated by µx and σ 2 .
Corrado Caudek

¯ x
¯

54

Riccardo Rigon



Sample Variability
The distribution that is the basis of statistical inference is the sample
distribution.

Definition: the sample distribution of a statistic is the distribution of
values that the specific statistic assumes for all samples of size n that
can be extracted from the population.

It must be noted that if the simulation considers less samples than all
those theoretically possible than the resulting distribution will only be
an approximation of the real sample distribution.
Corrado Caudek

55

Riccardo Rigon




Having created different statistics, we can now make some hypotheses. For
example:

• Do the samples all have the same mean and the same variance?

• Does the mean depend on the numerosity of the sample?

• Does the variance depend on the numerosity of the sample?

56

Riccardo Rigon




If the samples do not have the same mean, a trend can present istself.

57

Riccardo Rigon



The variance can vary with the numerosity of the sample !

If it does not stabilise as the data of the sample increases than the data
are said to have “Infinite Variance Syndrome”.
58

Riccardo Rigon



Null Hypothesis

We will have a chance to look at hypothesis testing in detail in future
lectures. However, it is well to remember the following:

• Generally, it is not possible to definitively prove anything. One can
only attempt to prove that a hypothesis is not true.

• Let H0 be the (null) hypothesis to be tested. If H0 can not be rejected,
then one an affirm that “it is true” with a certain degree of confidence.

59

Riccardo Rigon



Other Statistics: Covariance

Given two datasets, for example:

hi = {h1 , · · ·, hn } and li = {l1 , · · ·, ln }

La covariance between these two datasets is defined as:

n

1
Cov(hi , li ) := (li − ¯i )(hi − hi )
l ¯
N −1 1

60

Riccardo Rigon



Other Statistics: Correlation

Given two datasets, for example:

hi = {h1 , · · ·, hn } and li = {l1 , · · ·, ln }

La correlation between these two datasets is defined as:

Cov(l, h)
ρlh := √
σh σl

61

Riccardo Rigon




Please observe that one can consider the correlation between two sample
series of equal length:

hi = {h1 , · · ·, hn−1 } and hi+1 = {h2 , · · ·, hn−1 }

Resulting in:
n−1

1 ¯ ¯
Cov(hi , hi+1 ) := (hi − hi )(hi+1 − hi+1 )
N −1 j=1

62

Riccardo Rigon




Repeating this operation for the series which are gradually reduced in
length and separated by r instants, the resulting series are:

r
hi = {h1 , · · ·, hn−r } and hi+r = {hr , · · ·, hn }
From where:
n−r

1 ¯ r )(hi+r − hi+r )
¯
Cov(hi , hi+r )
r
:= (hi
r
− hi
N −1 j=1

Cov(hr , hi+r )
ρ(hi , hi+r ) :=
r i
σi σi + r
r
63

Riccardo Rigon



Other Statistics: Autocorrelation

64

Riccardo Rigon



Random Sampling

Within the strategy of creating and analysing data samples, the selection ( or,
sometimes, the generation) of random samples plays an important role.

A random sample of n events, selected from a population, is such if the probability
of that sample being selected is the same as any other sample of the same size.

If the data are generated, then one is carrying out a random experiment. Some
examples of this are:
•tossing a coin;
•counting the rainy days in a year; and
•counting the days when the river flow at the Bridge of San Lorenzo, Trento, is
greater than a predetermined value.

Riccardo Rigon



Sample Variability
Simulation 2
Let us consider another example where sample variability is illustrated as
follows:
1. the same population as in the previous example shall be used (N = 4);
2. by means of the computer programme R, 50,000 samples will be
extracted, with replacement, from the population of size n = 2;
3. the mean will be calculated for each of these samples of size n = 2;
4. the mean and variance of the distribution of means of the 50,000
samples of size n = 2 will be calculated.
Corrado Caudek

66

Riccardo Rigon



Sample Variability
3 Simulazione 2

N - 4
n - 2
nSamples - 50000
X - c(2, 3, 5, 9)

Mean - mean(X)
Var - var(X)*(N-1)/N

SampDistr - rep(0, nSamples)
Corrado Caudek

for (i in 1:nSamples){
samp - sample(X, n, replace=T)
SampDistr[i] - mean(samp)
}

MeanSampDistr - mean(SampDistr)
67
VarSampDistr - var(SampDistr)*(nSamples-1)/nSamples

Riccardo Rigon
Tecniche di Ricerca Psicologica e di Analisi dei Dati 27


Sample Variability
3 Simulazione 2

N - 4
n - 2
nSamples - 50000
X - c(2, 3, 5, 9)

Mean - mean(X) Mean and Variance of the Sample

Corrado Caudek

}

67

Riccardo Rigon


Sample Variability
3 Simulazione 2

N - 4
n - 2
nSamples - 50000
X - c(2, 3, 5, 9)

Mean - mean(X) Mean and Variance of the Sample

Corrado Caudek

50,000 samples are extracted
}

67

Riccardo Rigon


Sample Variability
3 Simulazione 2

! Results of analysis with R:
Risultati della simulazione

Mean
[1] 4.75
Var
[1] 7.1875
MeanSampDistr
[1] 4.73943
VarSampDistr
[1] 3.578548
Var/n
Corrado Caudek

[1] 3.59375

68
Riccardo Rigon



Sample Variability
! Population:
µ = 4.75, σ = 7.1875
2

๏Sample distribution of the means:
µx = 4.75, σx = 3.59375
¯
2
¯

๏Results of the R simulation:

µx =
ˆ¯ 4.73943, σx
ˆ¯
2
= 3.578548
Corrado Caudek

69

Riccardo Rigon



Thank you for your attention!

G.Ulrici - Uomo dope aver lavorato alle slides , 2000 ?

70

Riccardo Rigon


6 measurement&representation

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (20)

Semelhante a 6 measurement&representation

Semelhante a 6 measurement&representation (20)

Mais de AboutHydrology Slides

Mais de AboutHydrology Slides (12)

6 measurement&representation