Fund returnsandperformanceevaluationtechniques grinblatt
1. JOURNAL OF FINANCIAL AND QUANTITATIVE ANALYSIS VOL. 29, NO. 3, SEPTEMBER 1994
A Study of Monthly Mutual Fund Returns and
Performance Evaluation Techniques
Mark Grinblatt and Sheridan Titman*
Abstract
This paper empiricatty contrasts the Jensen Measure, the Positive Period Weighting Mea-
sure, developed in Grinblatt and Titman (1989b), and a measure developed from the Trey nor-
Mazuy (t966) quadratic regression on a sample of 279 mutuat funds and tO9 passive port-
folios, using a variety of benchmark portfolios. The study finds that the measures generally
yield simitar inferences when using the same benchmark and that inferences can vary, even
from the same measure, when using different benchmarks. This paper also analyzes the
determinants of mutual fund performance. Tests of fund performance that employ fund
characteristics, such as net asset value, load, expenses, portfolio tumover, and management
fee are reported. These tests surprisingly suggest that tumover is significantly positively
related to the ability of fund managers to eam abnormal retums.
I. Introduction
The development of the Capital Asset Pricing Model (CAPM) in the tnid-
1960s provided financial economists with a tool for adjusting retums for risk. An
important application ofthis model, implemented by Jensen (1968), (1969),' is the
evaluation of the performance of managed portfolios. However, this approach to
evaluating portfolio performance has been the subject ofa great deal of controversy.
There are three major reasons for this controversy: benchmark efficiency,
timing, and statistical power. This paper seeks to empirically assess the impor-
tance of each of these three issues. We do this by studying the performance of
a sample of 109 passive portfolios constructed from securities characteristics and
•Anderson Graduate School of Management, University of Catifomia, Los Angetes, Los Angetes,
CA 90024, and Carrott Sctiool of Management, Boston College, Ctiestnut Hill, MA 02167, respectively.
Ttie authors thank Julian Franks, Bruce txhmann, David Mayers, Rena Repetti, Jay Shanken, JFQA
Referee and Associate Editor Rex Thompson, and seminar participants at University of California, Los
Angeles, University of California, Berkeiey, University of British Columbia, University of Washington,
Duice University, Rutgers University, and the Wharton School, University of Pennsylvania, for valuable
comments on earlier drafts. The authors also appreciate the contributions of Jim Brandon, Nick Crew,
Pierre Hillion, Khai Kan, Haeyon Kim, Erik Sirri, and Mark Tsesarsky, who provided excellent research
assistance, and of Bruce Ixhmann and David Modest, who supplied monthly factor retums. Titman
gratefully acknowledges financial support from the Batterymarch Fellowship program. Both authors
acknowledge financial support from the UCLA Academic Senate.
' An equivalent approach was developed by Treynor (1965). The issues discussed in this paper that
apply to Jensen's Measure also apply to Treynor's Measure.
419
2. 420 Journal of Financial and Quantitative Analysis
industry groups, as well as a sample of 279 mutual funds. Differences in the per-
formance of the passive portfolios with the various evaluation techniques would
confirm that performance was potentially sensitive to the technique used. Evidence
that the abnormal performance, as measured by a particular evaluation technique,
systematically deviates from zero would indicate a bias in the technique, given
that uninformed investors can easily mimic these passive portfolios. Differences
in the performance of the mutual funds would indicate that the set of strategies
followed by mutual fund managers is sensitive to the technique used.
A. Benchmark Efficiency
The first source of controversy is that the CAPM approach (and analogous
multifactor approaches) to performance evaluation requires the use of a benchmark
portfolio(s).^ As Roll (1978) and others have noted, performance evaluation with
these methods is likely to be sensitive to the benchmark choice.^ In particular,
benchmarks that are mean-variance inefficient provide erroneous inferences. The
well-known size and dividend-yield biases, documented in tests of the CAPM,
provide one set of recipes for managers who wish to game an evaluation with
CAPM-based benchmarks.
To assess the benchmark issue, the sensitivity of the different performance
measures to the choice of the benchmark is analyzed. The benchmarks exam-
ined include the CRSP equally-weighted index (EW), the CRSP value-weighted
index (VW) (a benchmark consisting of ten factor portfolios (FIO) constructed by
Lehmann and Modest (1988)), and an eight-portfolio benchmark (P8) developed
by Grinblatt and Titman (1988) and used in Grinblatt and Titman (1989a).
B. Timing Ability
The second source of controversy in this literature is a statistical bias in
Jensen's evaluation technique that arises whenever an evaluated portfolio suc-
cessfully times the market."* This bias can result in successful timers generating
negative performance numbers, even in large samples.
In response to this problem, Grinblatt and Titman (1989b) proposed a new
measure, the Positive Period Weighting Measure, that is not subject to the timing-
related perversities of traditional evaluation techniques. An altemative to the
Positive Period Weighting Measure is a measure developed here that uses the
Treynor and Mazuy (1966) quadratic regression to aggregate the effects of timing
and selectivity ability.^ This "Treynor-Mazuy Total Performance Measure" is
specifically designed to pick up beta variations that are linearly related to the
retum of the benchmark portfolio. If retums are normally distributed, then in the
absence of timing ability, these two new measures generate the same inferences
on average as the Jensen Measure. However, funds that either time the market or
^See Grinblatt and Titman (1993) for a performance measurement approach ttiat does not require
a t)enctimark portfolio.
^See atso ttie discussion in Elton, Gruber, Das, and Hlavka (1993).
••See Jensen (t972), Admati and Ross (t985), Dybvig and Ross (t985), and Grinblatt and Titman
(1989b).
'See Admati et at. (t986) for the conditions under which this is true.
3. Grinblatt and Titman 421
pick portfolios with retums that are co-skewed with the benchmark retums will
exhibit different performance with these three measures.
C. Statistical Power
The final source of controversy is statistical power. Portfolio retums are
noisy, which makes it difficult to detect abnormal performance when it exists. For
example, a portfolio manager of a billion dollar fund (of which there are many)
who was able to consistently generate a 2-percent abnormal retum would thus be
creating over $20 million per year in value for the fund's investors. However,
an excess retum of 2 percent per year is generally not statistically significant,
even with ten years of monthly retum observations. Studies that employ a large
sample of mutual funds often exacerbate this problem. This is because the levels
of statistical significance must be adjusted to account for the fact that out of a
sample of 200 funds, a few will exhibit extreme performance simply by chance.
The strategy implemented in this paper requires that we have some prior
beliefs about the determinants of superior (or inferior) portfolio performance. Even
when performance measures in isolation are not sufficiently powerful to reject the
null hypothesis of no performance, tests using prior beliefs about the determinants
of performance may have power to reject. For example, if we suspect that funds
with low net asset values can outperform funds with high net asset values (because
they have a smaller effect on market prices with their offers to buy and sell), we
can estimate cross-sectional regressions of performance on net asset value.
The paper is organized as follows. Section II describes the data. Section
III describes the measures and benchmarks. Section IV assesses the benchmark
issue. Section V assesses the timing issue. Section VI describes the relation
between performance and fund characteristics such as tumover ratio, management
fee, expense ratio, and load. Section VII concludes the paper.
II. The Data
A. Mutual Fund Data
Mutual fund data were obtained from CDA Investment Technologies, Inc., of
Silver Springs, Maryland. The data consist of monthly cash-distribution-adjusted
retums and investment goals for 279 funds that existed from December 31, 1974,
to December 31, 1984. The data were spot checked with data collected by hand
and found to be accurate. As with most mutual fund studies, the mutual fund retum
data are subject to survivorship bias. Since CDA's nonacademic clients have little
interest in mutual funds that no longer exist, funds that went out of business prior
to December 31,1984, are excluded from the CDA data set. Grinblatt and Titman
(1989a) estimated the survivorship bias in this sample and it does not appear to be
large, on the order of 0.5 percent per year.^
Fund characteristics were obtained from the Wiesenberger Investment Com-
panies Service (1975). This includes annual data on net asset value, load, man-
*See atso the anatysis in Brown, Goetzman, Ibbotson, and Ross (t992).
4. 422 Journal of Financial and Quantitative Analysis
agement fee, expense ratio, and portfolio tumover at the beginning of the sample
period.
B. Stock Data
In addition to the sample of mutual funds, we also obtained stock retums from
the CRSP Daily Retums File. The daily retums were compounded to calculate the
monthly portfolio retums used to form and test the benchmark portfolios, as well
as evaluate the performance of 109 passive investment strategies. In addition, data
on cash dividends and interest rates, used to form some of the passive strategies
and compute a risk-free rate, were respectively obtained from the CRSP Daily
Master and Bond Files.
Since the passive strategies do not use private information, they should gen-
erate zero performance with properly designed measures and benchmarks. The
109 portfolios include 37 industry portfolios and 72 portfolios formed on the basis
of six characteristics that are related to CAPM and APT "anomalies." Firms are
divided into 37 industry portfolios based on their two-digit SIC codes at the be-
ginning of the sample period. All "two-digit" industries with at least 20 firms are
included in the analysis. The 72 characteristic portfolios are formed by ranking
the stocks on the basis of the different characteristics and then dividing them into
12 equally-weighted portfolios based on their rankings. For a given characteristic,
portfolio 1 represents the portfolio formed from firms with the lowest rankings of
that characteristic. (For example, portfolio 1 ofthe size portfolios consists of firms
with sizes among the lowest 8'/3 percent.) Specifically, the six characteristics are:
1) Firm size, determined by the most recent capitalization available on the CRSP
Master File prior to the month of the observed retum.
2) Dividend yield, calculated from the CRSP Master File using the calendar year
prior to the observed retum.
3) Past retums, computed from the CRSP Daily Retums File using the three
calendar years prior to the observed retum.
4) Interest rate sensitivity, as measured by the slope coefficient on an equally-
weighted index of 16- to 21 -year govemment bonds in an excess retum regres-
sion using this bond portfolio's retum and the retum of an equally-weighted
portfolio of all CRSP-listed stocks as regressors. The time series uses the three
calendar years prior to the observed retum.
5) Co-skewness, as measured by the slope coefficient on the "squared term" in
a regression using the excess retum and squared excess retum of the equally-
weighted stock portfolio. The time series uses the three calendar years prior to
the observed retum.
6) Beta, as computed against the equally-weighted stock portfolio in a market
model excess retum regression using the three calendar years prior to the ob-
served retum.
5. Grinblatt and Titman 423
III. Measures of Performance and Benchmark Portfolios
A. Perfortnance Measure Description
Three performance measures will be compared in this study: the Jensen
Measure, the Positive Period Weighting Measure, and the Treynor-Mazuy Measure
of Total Perfomiance. Each calculates performance relative to a benchmark, which
is a portfolio or a group of portfolios, and computes abnormal retums by using the
benchmark to adjust the average retum of a portfolio for risk. Ifthe methodology
behind the measures is correct, we interpret the measures as the average amount
per month by which a manager beats a passive portfolio with equivalent risk per
dollar invested.
The Jensen Measure is the intercept in a regression of the time series of
excess retums (above the one-month Treasury Bill rate) of the evaluated portfolio
against the time series of excess retums ofthe benchmark portfolio(s). This is the
traditional measure used in most previous studies of fund performance.
The Positive Period Weighting Measure, developed in Grinblatt and Titman
(1989b), is obtained in two steps. First, one selects a vector of weights, w i , . . . , Wf.
Each element of the vector corresponds to one time series observation. Second,
the performance of a fund is computed by taking the dot product of the weight
vector and the excess retum vector of the mutual fund, i.e.,
a = S,w,Rp,.
The weight vector is chosen to have nonnegative weights that make the
weighted sum of the excess retums of the benchmark portfolio(s) sum to zero,
i.e.,
E,w,R,, = 0, vv, > 0,
where R/, = period ; excess retum of the index portfolio used as a benchmark.
Thus, the weight vector is both benchmark specific and sample period specific.
Obviously, there are many sets of weights with the properties mentioned
above. The weights employed in this study can be interpreted as the marginal
utilities of an uninformed investor with power utility. Given this interpretation,
uninformed investors with power utility prefer to add to their existing optimal
passive portfolio a small amount of any mutual fund retum with a measure that
is positive. Grinblatt and Titman (1989b) provide conditions under which posi-
tive values for these measures imply that the mutual fund manager has superior
information. This measure is discussed in more detail in Appendix A.
The Treynor-Mazuy (1966) quadratic regression is similar to the Jensen Mea-
sure regression. Here, however, there are two explanatory variables: the excess
retum of the benchmark portfolio and the square of that excess retum. The in-
tercept in this regression provides an estimate of selectivity ability; the product
of the quadratic term slope coefficient and the variance of the benchmark retum
(henceforth, the Treynor-Mazuy Timing Measure), provides an estimate of timing
ability. We call the sum of the timing and selectivity terms the Treynor-Mazuy
Measure of Total Perfonnance. The latter measure is discussed in more detail in
Appendix B.
6. 424 Journal of Financial and Quantitative Analysis
In principle, if we use the Treynor-Mazuy regression to analyze performance
with multiple portfolio benchmarks, a large number of cross-product terms must
be included in the regression. For the P8 and FIO benchmarks, the number of
cross-product terms would be very large, making this infeasible. What we do
instead is calculate P8 and FIO Treynor-Mazuy measures that use the retums of
the ex post efficient combination of the portfolios included in the benchmarks.
However, since no prior research has offered theoretical justification for doing
this, one should interpret any Treynor-Mazuy performance results with multiple
portfolio benchmarks cautiously.
B. Benchmark Portfolio Description
There are four benchmarks: the first two are the monthly rebalanced equally-
weighted index computed from all CRSP securities and the CRSP value-weighted
index. The third benchmark is a factor portfolio benchmark, created from factor
portfolio weights used in Lehmann and Modest (1988). The input portfolio weights
were derived from a 10-factor maximum likelihood factor analysis over the 1978-
1982 period. The portfolios contain 750 securities in the 1978-1982 period and
slightly fewer in the 1975-1977 and 1983-1984 periods since some ofthe securities
from the middle period did not exist in the early and later periods. Although
this method of forming factor portfolios can potentially create survivorship bias,
(unreported) comparisons with the equally-weighted index suggest that this bias
is not large.
Past research suggests that none of these benchmarks is mean-variance effi-
cient. In particular, they generate biased performance measures that relate to size
(Banz (1981), Reinganum (1981)), dividend yield (Litzenberger and Ramaswamy
(1979), (1982)), and beta (Black, Jensen, and Scholes (1972)). In the 1975-1984
sample period, Grinblatt and Titman (1988) found the same size, dividend yield,
and beta-related biases with the EW and FIO benchmarks.^
The fourth benchmark, the P8 benchmark developed in Grinblatt and Titman
(1988) and used in Grinblatt and Titman (1989a), is not subject to any ofthe afore-
mentioned biases. The basic idea underlying the formation of this benchmark is
that variousfirmcharacteristics are correlated with their stocks' factor loadings. As
a result, characteristic-based portfolios can be used as proxies for the factors. The
eight-portfolio benchmark consists of four size-based portfolios, three dividend-
yield-based portfolios, and the lowest past retums portfolio: the equal weighting
of the smallest 8'A percent of firms is the first size-based portfolio; the average of
the second and third smallest size portfolios (out of 12) is the second portfolio;
the average of the fourth through ninth smallest size portfolios is the third port-
folio; and the average of the three largest size portfolios is the fourth. An equal
weighting of the two lowest dividend-yield portfolios (out of 12), the fifth and
sixth lowest dividend-yield portfolios, and the tenth and eleventh dividend-yield
portfolios comprise the three dividend-yield portfolios in the benchmark. The
lowest past retums portfolio (out of 12) is the eighth portfolio in the benchmark.
'Grinblatt and Titman (1988) did not test the VW benchmark for mean-variance efficiency, but
there is a well-known, (and by historical standards) large size-related bias with this benchmark over
the sample period.
7. Grinblatt and Titman 425
There is some evidence for the assertion that the P8 benchmark better reflects
tme performance than does the factor benchmark. First, as we stated earlier, there
do not appear to be biases in the P8 benchmark associated with the well-known
CAPM and APT anomalies in finance (see Grinblatt and Titman (1988)). Second,
the performance statistics reported in Grinblatt and Titman (1989a) with the P8
benchmark are very similar to the scores in Grinblatt and Titman (1993), which
do not require a benchmark, and instead make use of the fund's prior portfolio
holdings to risk-adjust a fund's average retum.^
IV. The Sensitivity of Performance to Different Benchmarks
Tables 1-3 analyze the sensitivity of perfonnance to benchmarks using the
samples of 109 passive portfolios and 279 mutual funds. Table 1 presents the
average monthly abnormal retums in these samples with the three measures and
the four benchmarks. Table 2 presents correlation matrices that examine the extent
to which benchmarks matter. Table 3 presents regression coefficients and F-tests
of the null hypothesis that various pairs of measures are identical.
A. The Sensitivity of Average Performance to Different Benchmarks
The average abnormal retums in Table 1 can be generated either with a
cross section (find performance for each fund, then average) or with a time se-
ries (equally-weight the retums of the funds, then find the performance of the
equally-weighted portfolio). The f-statistics, presented below the abnormal re-
tums, are computed from the time series standard errors.' Thus, under the random
walk hypothesis and the null hypotheses of no performance and homoskedastic
residual variances, the ^-statistics should be generated by the student f-distribution.
The procedures for calculating the ^-statistic for the Positive Period Weighting and
Treynor-Mazuy Measures are described in Appendices A and B, respectively.
Note that with either Panel A (passive portfolios) or Panel B (mutual funds)
in Table 1, both the average performance and f-statistic in each column (same
benchmark, different measures) vary much less than the numbers in each row
(same measure, different benchmarks). Hence, for the average passive portfolio
and average mutual fund, benchmarks seem to matter much more for performance
than do measures.
1. Passive Portfolios
With the exception of the performance results with the value-weighted index
and the Treynor-Mazuy Measure with the P8 benchmark, all of the passive port-
folio's average performance numbers in Table 1, Panel A are small. The largest
ignores the fact ttiat neittier of these two papers focused on actual fund retums, but onty on
hypothetical retums constructed from the CRSP-tisted portion of their holdings. If non-CRSP listed
securities are important components of the numtjers we report here, then the P8 benchmark could have
hidden biases.
'The cross-sectional standard deviations and standard errors are biased because the retums of
different funds are correlated. For the passive portfolios, the 12 cross-sectional standard deviations of
the abnormal monthly retums range from 0.0021 to 0.0027. For the mutual funds, the 12 range from
0.0030 to 0.0036.
8. 426 Journal of Financial and Quantitative Analysis
TABLE 1
Means and f-Statistics for the Three Performance Measures Using Four Benchmarks
Benchmark EW Index VW Index F10 P8
Panel A. 109 Passive Portfolios
JM 0.0002 0,0080 0,0004 0,0001
(1,50) (2.83)** (0,50) (0,42)
PW 0,0003 0.0080 0,0007 0,0001
(1,66) (2.82)** (0,88) (0,46)
TM 0,0003 0.0080 0,0003 -0.0022
(1,70) (2,82)** (0,39) (-11,29)**
Panel B. 279 Equity Mutual Funds
JM -0.0028 0,0009 -0.0033 -0,0004
(-1.59) (1,07) (-3,56)** (-0.65)
PW -0,0034 0,0008 -0.0037 -0,0001
(-1,92) (1,01) (-3.92)** (-0,23)
TM -0,0037 0.0008 -0.0043 -0,0025
(-2,09)* (0.98) (-4,48)** (-4,23)**
This table presents the average performance of 109 passive portfolios formed on the basis of secu-
rity characteristics as well as 279 mutual funds calculated with the Jensen (JM), the Positive Period
Weighting (PW), and the Treynor-Mazuy Total Performance (TM) Measures, The benchmarks used in-
clude the Equally-Weighted Index (EW), the Value-Weighted Index (VW), 10 Factor Portfolios (F10),
and eight Characteristic-Based Portfolios (P8), The f-statistics are calculated from the 120 time series
observations. For the Jensen Measure, they are the standard intercept f-statistic derived from a regres-
sion of the excess returns of an equally-weighted portfolio of passive portfolios or mutual funds on the
excess returns of the benchmark portfolio(s). For the other two measures, see Appendices A and B
respectively,
*Significant at 0,05 level (two-tailed test),
**Significant at 0.01 level (two-tailed test).
performance is seen with the value-weighted index, which exhibits a 10 percent
per year abnormal retum for the average passive portfolio. Since the average
passive portfolios should exhibit zero performance, the magnitude of the perfor-
mance numbers implies an inefficiency in the VW benchmark that can easily be
gamed (for example, by buying small stocks). The P8 benchmark, by contrast,
yields an average abnonnal retum of about 0.1 percent per year for the average
passive portfolio with both the Jensen Measure and the Positive Period Weighting
Measure.
2. Mutual Funds
The mutual funds exhibit average abnormal performance that ranges from
about —4 percent or —5 percent to 1 percent in each of the rows. Once again, the
P8 benchmark with the Positive Period Weighting Measure and Jensen Measure
is closest to zero. If we add 2 percent per year in fees and transaction costs to
this number, this benchmark would seem to indicate that the average mutual fund
manager beats the market by 2 percent per year. This perfonnance is consistent
with the performance evaluation of gross retums in Grinblatt and Titman (1993),
which employed fund holdings to adjust for risk, rather than employ a benchmark.
None of the other benchmarks generate perfonnance numbers that are consistent
with these earlier findings.
The magnitude of Table 1, Panel B's average performance scores for the
mutual funds using the equally-weighted and factor analysis benchmarks (about
9. Grinblatt and Titman 427
-3.5 to - 5 percent per year) are too large to be explained by the transaction costs
and the expenses of the funds.'" This suggests that the negative performance
must be either due to the funds systematically picking stocks that do poorly, or
to the benchmarks being inefficient. Given the known size-related biases of these
benchmarks, and the fact that mutual funds tend to invest in larger than average
firms, the latter possibility is the more plausible. Indeed, since average fees and
transaction costs are less than 3 percent per year, the highly negative abnormal
retums of the mutual funds alone with the EW and FIO benchmarks should be
sufficient to reject the benchmarks as valid indicators of a fund manager's ability.
Panel B's positive mutual fund performance with the value-weighted bench-
mark can be similarly explained by examining the composition of the average
mutual fund portfolio. The average mutual fund tilts its portfolio toward small
stocks more than the value-weighted index, but less than the equally-weighted
index. Thus, the known size-related bias of the value-weighted benchmark over
this period is probably sufficient to generate the observed results.
With the P8 benchmark, which is not subject to this size-related bias, average
perfonnance is virtually zero (except for the Treynor-Mazuy measure). Clearly,
the conclusions that one would draw about the overall performance of the mutual
fund industry are strongly influenced by the choice of benchmarks. Moreover,
they are likely to be erroneous if the benchmark can be easily gamed by exploiting
CAPM and APT anomalies.
B. Correlations between Performance Estimates Using Different
Benchmarks
Table 2 reports correlations that, in combination with the results from Table
1," illustrate the sensitivity of the different performance measures to the choice
of the benchmark. Of particular note is the correlation between the perfonnance
scores generated with the P8 benchmark and the three other benchmarks, which
are not very large. While the EW, VW, and FIO benchmarks also generate different
inferences, the correlations are much higher. For example, with the Jensen Measure
and the passive portfolios, the correlation between EW and FIO performance is
0.64, which is about twice as large as the correlation between P8 perfonnance and
performance with either of these two benchmarks.
Table 2 suggests that the performance of individual funds is likely to be
sensitive to the choice of benchmark portfolios, even in cases where the average
fund has similar perfonnance with two benchmarks (e.g., P8 and VW in Panel B).
The correlations between performance with any pair of benchmarks are not close
to one. The passive portfolios are deliberately comprised of stock portfolios that
differ as much as possible in a number of important dimensions and thus exhibit
lower correlations than the mutual funds. However, even with the mutual funds,
the maximum correlation does not exceed 0.9. Moreover, except for the FIO and
EW comparison, the largest mutual fund correlations are with benchmark pairs
that exhibit highly different average scores in Table 1.
'"See Grinblatt and Titman (1989a) for estimates of these costs.
' ' If, for two benchmarks (measures), the means and f-statistics are similar and the correlations are
close to one, the scores of individual funds are virtually identical with the two benchmarks (measures).
10. 428 Journal of Financial and Quantitative Analysis
TABLE 2
Pearson Correlations between Abnormal Returns Using Different Benchmarks
109 Passive Portfolios 279 Equity Mutual Funds
Benchmark VW FIO P8 VW FIO P8
Panel A. Jensen treasure
EW 0.67 0.64 0.35 0.91 0.86 0.60
VW 0.51 0.17 0.69 0.63
F10 0.32 0.42
Panel B. Positive Period Weighting Measure
EW 0.74 0.69 0.19 0.89 0.89 0.48
VW 0.65 0.09 0.71 0.57
FIO 0.21 0.37
>
Pane! C. Treynor-Mazuy Total Performance Measure
EW 0.75 0.71 0.37 0.88 0.87 0.57
VW 0.59 0.22 0.65 0.61
FIO 0.41 0.43
This table presents the Pearson correlations between the performance numbers generated with four
different benchmarks on a sample of 109 passive portfolios formed on the basis of security charac-
teristics as well as 279 mutual funds. The benchmarks include the Equally-Weighted Index (EW), the
Value-Weighted Index (VW), 10 Factor Portfolios (FIO), and eight Characteristic-Based Portfolios (P8).
These correlations are calculated separately for three performance measures: the Jensen, the Positive
Period Weighting, and the Treynor-Mazuy Total Performance Measures.
C. A Formal Test of the Similarity of Performance between Benchmarks
A limitation of the analysis in the previous section is the lack of reported
significance levels for the correlations in the previous subsection. Because the
vectors used to compute the correlations contain correlated random elements, the
standard significance tests are biased. This section presents a formal F-ie&l of
whether the performance numbers with two different benchmarks are similar. The
procedure is based on an extension and aggregation of the techniques developed in
Fama and MacBeth (1973), Sefcik and Thompson (1986), and Gibbons, Ross, and
Shanken (1989), and makes use ofthe time series to compute significance levels
for the similarity of the measures.
For N funds, let a and 0:2 denote the two N x I vectors of performance
computed using two different benchmarks (and the same measure). A univariate
cross-sectional regression of ai on 02 has an intercept of zero and a slope co-
efficient of one under the null hypothesis that 02 is an unbiased estimate of the
performance measure a j . The reverse regression tests whether ai is an unbiased
estimate of 0:2. The two performance measures can only be unbiased estimates
of each other when they are identical, since forecast error in either regression bi-
ases the slope coefficient in the other regression toward zero. As a result, either
the regression or the reverse regression will reject the hypothesis of unbiasedness
whenever the measures are sufficiently different. A small sample statistic with a
known distribution exists only when the test is performed separately for the re-
gression and reverse regression (assuming bivariate normality). Unfortunately, no
similar test exists for the joint hypothesis that the coefficients have these values in
both the regression and reverse regression simultaneously.
11. Grinblatt and Titman 429
TABLE 3
F-Tests of the Similarity between Performance Scores with Different Benchmarks
Dependent Variabie Benchmari< and
Degrees of Freedom for F-Test
Independent Variabie EW VW FIO P8
Benchmarl< 2,117 2,117 2,108 2,110
Panel A. Jensen Measure
EW 70 0.004 -0,001 0,001
71 0.980 0.928 0.647
F 9.97** 0.58 13.75**
VW 70 -0,004 -0.004 -0.001
71 0.842 0.695 0.632
F 6.04** 18,46** 9.72**
FIO 70 -0,000 0.003 0.001
71 0.793 0.691 0,421
F 1.08 12.98** 19.76**
P8 ' 70 -0.003 0.001 -0.003
71 0.548 0.623 0.418
F 4,05* 10.74** 19.73**
Pane! B. Positive Period Weighting Measure
EW 70 0.004 -0.001 0.002
71 0.931 0.928 0.556
F 13,46** 0.51 18.95**
VW 70 -0.004 -0.004 -0,001
71 0.843 0.704 0.635
F 7.39** 20.07** 8.03**
FIO 70 -0.000 0.003 0,001
71 0.847 0.709 0.416
0.61 15.55** 22,24**
F
P8 70 -0.003 0.001 -0.004
71 0.408 0.514 0.334
F 6.29** 12.20** 22.48**
Panel C. Treynor-Mazuy Total Performance Measure
EW 70 0,004 -0,001 -0.000
71 0.905 0.956 0.581
15.29** 0.51 7.59**
VW 70 -0.004 -0.005 -0.003
71 0.855 0.701 0.605
F 7.78** 23,49** 23.35**
FIO 70 -0.000 0,003 -0,001
71 0.785 0.609 0,399
F 0.95 19.62** 12.67**
P8 70 -0.002 0.002 -0.003
71 0.552 0.608 0.461
F 5.81** 22.90** 18.14**
This tabie presents the intercepts {70), siope coefficients (71), and time series F-tests (described in
Appendix C) from cross-sectional regressions of performance with one benchmark against performance
with another benchmark. The benchmarks inciude the Equaiiy-Weighted index (EW), the Vaiue-Weighted
index (VW), 10 Factor Portfolios (FIO), and eight Characteristic-Based Portfoiios (P8). These tests are
calcuiated separateiy for three performance measures: the Jensen, the Positive Period Weighting, and
the Treynor-Mazuy Totai Performance Measures.
*Significant at 0.05 levei.
** Significant at 0.01 levei.
12. 430 Journal of Financial and Quantitative Analysis
The F-test of whether the coefficient vector in a single regression significantly
deviates from (0,1) is biased when the cross section is used to compute the F-
statistic. To remedy this, we employ a time series procedure to compute the
relevant F-statistic. The steps required for computing this time series F-statistic
are reported in Appendix C.
Table 3 reports the coefficient estimates for cross-sectional regressions and
reverse regressions for the mutual funds along with results of an F-test of whether
the intercept is zero and the slope coefficient is one in these regressions.'^ The
results are presented in four matrices, where each matrix corresponds to a measure.
The rows in each matrix correspond to the independent variables in the cross-
sectional regression. The columns correspond to the dependent variable. Each
element in the matrix consists of a triplet: respectively, the least squares intercept
estimate, the slope coefficient estimate, and the time series F-statistic.
The results, which are consistent between each regression and its correspond-
ing reverse regression, provide conclusions that are similar to those derived from
Tables 1 and 2: benchmarks generally do matter. There are significant deviations
from an intercept of zero and a slope coefficient of one for all of the regressions,
except for results with the factor benchmark versus the equally-weighted bench-
mark.
V. The Sensitivity of Performance to Different Measures
Tables 1, 4, and 5 indicate that the Jensen and Positive Period Weighting
Measures provide very similar inferences. Table 4 examines correlations between
the performance scores using the different measures, but the same benchmark.
The most striking observation is the magnitude of these numbers. For any given
benchmark, the correlation between the performance scores exceeds 0.94 for the
passive portfolios and 0.97 for the mutual funds.
. In combination with the means and /-statistics in Table 1, these results in-
dicate highly similar scores between the Jensen and Positive Period Weighting
Measures.'^ This is true for both the passive portfolios and the mutual funds.
The Treynor-Mazuy Total Performance Measure, which also exhibits near perfect
correlation with the other two measures, is also highly similar for three of the four
benchmarks. However, it appears to shift performance downward by 2.5 percent
to 3 percent per year for the P8 benchmark, as indicated by the means in Table
1. This downward shift appears to be virtually constant on a fund by fund basis,
whether looking at individual passive portfolios or looking at individual mutual
funds. One has to conclude from this that the Treynor-Mazuy Measure is gener-
ating spurious inferences with the P8 benchmark. As was suggested earlier, this
is possible because of the ad hoc innovation employed to adapt this measure to a
multiple portfolio benchmark.
F-statistics cannot be computed for the passive portfolios because computing the F would
require inversion of a singular or nearly singular matrix.
"Unreported cross-sectional standard errors are also virtually identical across measures, irrespective
of the benchmark.
13. Grinblatt and Titman 431
TABLE 4
Pearson Correlations between Abnormal Returns Using Different Measures
109 Passive Portfolios 279 Equity Mutual Funds
Measure PW TM Measure PW TM
Panel A. EW Benchmark
JM 0,97 0.96 JM 0.99 0,98
PW 1.00 PW 1.00
Panel 8. VW Benchmark
JM 1.00 1,00 JM 1,00 1,00
PW 1.00 PW 1,00
Panel C. F10 Benchmark
JM 0.95 0,94 JM 0,99 0.98
PW 0.99 PW 0,99
Panel D. P8 Benchmark
JM 0.95 0,94 JM 0,98 0.97
PW 0,94 PW 0,97
This table presents the Pearson correlations between the performance numbers generated with three
different performance measures on a sample of 109 passive portfolios formed on the basis of security
characteristics as well as 279 mutual funds. The measures inciude the Jensen (JM), the Positive Period
Weighting (PW), and the Treynor-Mazuy Total Performance (TM) Measures, The benchmari^s include
the Equaiiy-Weighted Index (EW), the Value-Weighted Index (VW), 10 Factor Portfolios (FIO), and eight
Characteristic-Based Portfolios (P8),
A. A Formal Test of the Similarity between Performance Measures
Table 5, which follows the same format as Table 3, formally tests whether mea-
sures matter. It reports the intercepts and slope coefficients from cross-sectional
regressions of performance with one measure against perfonnance with another
measure. Time series F-statistics, analogous to those produced in Table 3, test
whether the intercept and slope coefficients in a particular regression are, respec-
tively, zero and one. Each matrix in Table 5 corresponds to a benchmark. Rows
represent independent variables in the cross-sectional regression. Columns corre-
spond to the dependent variables. Once again, both the regressions and the reverse
regressions all seem to yield the same inferences.
The F-statistics are generally far below the critical F that demarcates the 5-
percent significance level (most are close to zero). This indicates that different
measures generally yield the same performance scores. The only significant Fs
consistent with our discussion in the last section are those indicating a distinction
between the Treynor-Mazuy Measure and the other two measures with the P8
benchmark.
B. When Do the Jensen and Positive Period Weighting Measures Differ?
This section has concluded that for most funds, the different measures pro-
vide similar inferences. In particular, the Jensen Measure and the Positive Period
Weighting Measure are virtually identical for most mutual funds, irrespective of
the benchmark. The analysis presented in this subsection suggests that the ob-
served similarity between these two measures arises because most mutual funds
fail to successfully time the market. The Positive Period Weighting and Jensen
Measures could nonetheless differ for a few mutual funds that do time the mar-
14. 432 Journal of Financial and Quantitative Analysis
TABLE 5
F-Tests for the Similarity between Performance Scores Using Different Measures
Benchmark: EW Benchmark: F10
F-Test Degrees of Freedom: 2,117 F-Test Degrees of Freedom: 2,108
Dependent Variable Measure Dependent Variable Measure
Independent
Variable Measure JM PW TM JM PW EW
JM 70 -0,001 -0,001 -0,000 -0,001
71 1,011 1,024 0,975 1,043
F 0,062 0,130 0,211 0,502
PW 70 0.000 -0,000 0,000 -0.000
71 0,967 1,019 0,996 1,066
F 0,060 0,015 0,115 0,223
TM 70 0,001 0,000 0,001 0,000
71 0,940 0,978 0.921 0,922
F 0,138 0.015 0,513 0.253
Benchmark: VW Benchmark: P8
F-Test Degrees of Freedom: 2,117 F-test Degrees of Freedom: 2,110
Dependent Variabie Measure Dependent Variable Measure
Independent
Variable Measure JM PW TM JM PW EW
JM 70 0,000 -0,000 0,000 -0.002
71 0,996 0,995 1,074 0.960
F 0.002 0,005 0,208 7,367*
PW 70 0,000 -0,000 -0,000 -0,002
71 1,003 0,999 0,888 0,872
F 0,002 0,001 0,428 8,289*
TM 70 0,000 0,000 0,002 0,003
71 1,004 1,001 0,987 1,084
F 0,005 0,001 8,103* 7,956*
This table ,
,— the ..,.w<ww^... HJ,. .j..jfj... vw,.iii^nii i<> ('yi), and time series F-tests (described in
.. .w l), slope
Appendix C) from cross-sectional regressions of performance with one measure against performance
calculated with a different measure but using the same benchmark. The measures include the Jensen,
the Positive Period Weighting, and the Treynor-Mazuy Total Performance Measures, These tests are
calculated separately using four benchmarks: the Equaiiy-Weighted index (EW), the Value-Weighted
Index (VW), 10 Factor Portfolios (FIO), and eight Characteristic-Based Portfolios (P8).
*Significant at 0,05 level,
**Significant at 0,01 level,
ket. This would indicate that, for some purposes, employing the Positive Period
Weighting Measure in lieu of the Jensen Measure could still be worthwhile.
To test whether, for some mutual funds, the difference between the Jensen
and Positive Period Weighting Measures arises from timing, we regress this differ-
ence against the Treynor-Mazuy Timing Measure: the product of the coefficient
of the Treynor-Mazuy quadratic term times the variance of the benchmark port-
folio (or the ex post efficient combination of the portfolios in benchmark). This
cross-sectional regression, which is reported in Table 6, documents a statistically
significant relation between these variables.'"* The significance is robust with
respect to the benchmark used.
'''The time series procedure for calculating the r-statistics in this table is described in Appendix C.
15. Grinbiatt and Titman 433
TABLE 6
How Timing Performance Relates to the Difference between
the Positive Period Weighting and Jensen Measures
Benchmark
EW VW FIO P8
Siope Coefficient 0.194 0,025 0.204 0.168
Time Series (-Statistic 5,84** 3,45** 2,22*
Using the sample of 279 mutual funds, this table reports the slope coefficients and the time series
(-statistics (described in Appendix C) of regressions of the difference between the Positive Period
Weighting Measure, and the Jensen Measure on the Treynor-Mazuy Timing Measure, These tests are
calculated separately using four benchmarks, the Equally-Weighted Index (EW), the Value-Weighted
Index (VW), 10 Factor Portfolios (FIO), and eight Characteristic-Based Portfoiios (P8).
*Significant at 0.05 level.
•'Significant at 0,01 level,
VI. The Power Issue: Fund Characteristic Tests
Grinblatt and Titman (1989a) used joint intercept tests to document significant
differences in the performance of mutual funds. These tests, however, offer no ev-
idence about the determinants of these differences. Regressions of cross-sectional
performance measures on fund characteristics may offer more powerful and more
interesting tests of performance against certain altemative hypotheses. They may
also offer insights into how inferences about the determinants of perfonnance are
affected by the benchmark choice.
Five fund characteristics reported in the 1975 edition of The Wiesenberger
Investment Companies Service are used in these regressions:'^
1) Net Asset Value in millions of dollars on December 31, 1974.
2) Load, computed at the end of 1974, which is the range of sales charges per
dollar investment in the fund, in percent terms.
3) Expense ratio (%), which is the fee-inclusive expenses of the fund divided by
average net asset value over the fiscal year prior to 1975.
4) Tumover (%), which is the minimum ofthe total market value of purchases or
sales (excluding transactions in U.S. govemment securities) divided by average
net asset value in the fiscal year prior to 1975.
5) Management fee (%), which is the schedule of (annualized) fees charged for
various net asset values. We took the fee percentage that is relevant for the
December 31, 1974, net asset value.
The analysis of how perfonnance relates to these characteristics is based on
two benchmarks, the reliable P8 and the unreliable FIO benchmark. The latter
benchmark is included to illustrate how benchmark inefficiency can lead to spuri-
ous inferences about the determinants of true performance in studies of this kind.
Table 7 presents a multiple regression of Jensen Measure performance on net asset
value, a load dummy, expenses net of fees, tumover, and fees. A load dummy
is used instead of the load itself because for some of the funds, the load charges
sixth characteristic reported by Wiesenberger (1975). cash yield, which is cash divided by
net asset value, was not analyzed because it is highly related to fund objective. The relation between
fund performance and fund investment objective has already been documented in Grinblatt and Titman
(1989a).
16. 434 Journal of Financial and Quantitative Analysis
depend on the amount invested and whether one is a new or old customer. The re-
gression also includes dummy variables to control for investment objective, since
prior work has shown that investment objective is related to performance.
These regressions provide insights into whether differences in performance
between funds are due to differences in transaction costs. If transaction costs are
important, we expect significant negative estimates for the fees, expenses, and
tumover coefficients, and significant positive estimates for the net asset value and
load coefficients, since large funds or funds whose marketing expenses are bome by
brokers collecting load-related commissions may economize on transaction costs.
One might also observe a negative coefficient on net asset value if small funds have
less impact on the market with their buy and sell orders than do large funds. The
coefficient on net asset value may also provide insights about survivorship bias in
the sample, which could induce a negative correlation between net asset value and
performance. Positive coefficients on fees and expenses would be indicative ofthe
existence of perfonnance. If investors were aware that a fund manager was capable
of eaming abnormal retums, they would be willing to compensate the manager
with higher fees and expenses (which might provide nonpecuniary benefits). A
positive coefficient on tumover would also be indicative ofthe existence of superior
performance, implying that better managers trade more to take advantage of their
superior information.
A. Calculating Appropriate f-Statistics and F-Statistics
Statistical significance cannot be inferred from the ordinary /-statistics de-
rived from cross-sectional regressions of fund perfonnance on fund characteris-
tics. These ^statistics are biased because the residuals are correlated across mutual
funds. For this reason, the /-statistics we report are derived from a time series pro-
cedure that is an extension of the procedure used by Fama and MacBeth (1973)
and Sefcik and Thompson (1986). Similar time series F-statistics test whether the
five fund characteristics (but excluding the investment objective dummies), jointly
explain performance. These time series procedures are described in Appendix C.
B. Ennplrical Results
The results of two multiple regressions are reported in Table 7. The re-
gressions with the factor benchmark indicate that (controlling for all the other
variables) there is no statistically reliable relation between the performance scores
and expense ratios or net asset values, but there is a significant relation between
performance and both management fees and load. However, these are most likely
to be due to the inefficiency of the factor benchmark.
A coefficient of -0.000833 (1/1200) on the fee variable is consistent with fees
merely reducing performance by the amount ofthe fee. For the P8 benchmark, the
marginally insignificant fee coefficient is less than 1.3 standard deviations away
from —0.000833. For the FIO benchmark, the estimated fee coefficient is more
than 3.5 standard deviations away from this value. While it is possible that low fee
managers are superior portfolio performers, one would most likely have expected
the opposite result. Moreover, if low fee managers are superior performers, one
17. Grinblatt and Titman 435
TABLE 7
Cross-Sectional Slope Coefficients for Two Multiple Regressions
with Fund Characteristics as the Regressors
Characteristic
Net
Asset Expense Mgt, Fund Load F-Stat, No, of Funds
Benchmark Value Ratio Fee Turnover Dummy (p-Value) in Regression
FIO -6,96E-7 4,15E-4 -4,37E-3 1,32E-5 8,32E-4 5,87 209
(-1,60) (0,719) (-4.40**) (2,29*) (2,50**) (0,000**)
P8 -4.03E-7 -4.08E-4 -2,55E-3 1,31E-5 6,15E-5 2,40 209
(-0,84) (-0.67) (-1,99*) (2,46*) (0,16) (0,042*)
This table presents the five slope coefficients and time series f-statistics for the cross-sectionai regres-
sion of mutuai fund performance on net asset vaiue, expense ratio, management fee, fund turnover, a
load dummy, and six dummy variables for Investment objective. To calculate performance, we use the
Jensen Measure with both the eight characteristic benchmark (P8) and the 10 factor portfolio bench-
mark (FIO), The time series (-statistics (described in Appendix C) are beiow the siope coefficients in
parentheses. Time series F-statistics and significance ievels (p-vaiues) are aiso reported. These test
whether the five slope coefficients for the noninvestment objective variables are jointly zero,
* Significant at 0.05 level,
**Significant at 0.01 levei,
would not expect their performance to be on the order of 50 basis points per year
for every 10 basis point reduction in the fee. Yet, this is what the FIO fee coefficient
implies.
Our conclusions about benchmark inefficiency being responsible for the load
coefficient with the factor benchmark are largely driven by heretofore unreported
work. In this work, we found that the "load portfolio" (long in load funds and short
in no load funds) is significantly negatively correlated with the retums of large firms
and firms with high dividend yields, and significantly positively correlated with
low dividend yield firms. Hence, the significant relation between perfomiance and
load may have been caused by the negative large firm bias and positive dividend
yield bias of the factor benchmark (see Grinblatt and Titman (1988)).
Table 7 also reports significant positive relations between portfolio tumover
and performance. The time series r-statistics were 2.46 with the P8 benchmark,
and 2.50 for the factor benchmark. The evidence from the P8 benchmark, which
has no apparent performance bias, is inconsistent with the null hypothesis of no
perfonnance ability. Under this hypothesis, there should be a negative relation
between tumover and measured performance when trading costs are netted out
of retums. In addition, the time series F-statistic for the joint significance of
the five characteristics with the P8 benchmark is at the margin for the 5-percent
significance level cutoff, indicating that multiple comparisons are not responsible
for our finding that fund characteristics, particularly tumover, may be important
determinants of performance.
The tumover coefficient for the P8 benchmark implies that a strategy of buying
high tumover funds and shorting low tumover funds achieves positive risk-adjusted
retums. Note, however, that investors could not have eamed these abnormal re-
tums, since low tumover open-end mutual funds cannot be sold short. For this
reason, using the P8 benchmark, we examined portfolio strategies that consisted of
equally-weighted portfolios of either the top 20 percent or the bottom 20 percent
18. 436 Journal of Financial and Quantitative Analysis
of mutual funds ranked on these characteristics.'* The results from these strate-
gies, reported in Table 8, represent abnonnal retums that an investor could have
eamed. They indicate that the positive relation between tumover and performance
with the P8 benchmark are due to both high tumover funds doing well and to low
tumover funds doing poorly. The abnormal retum for the high tumover portfolio
is about 0.8 percent per year and low tumover portfolio's performance is about
-1.3 percent per year. '^ The positive performance of the high tumover portfolios
is largest for the aggressive growth high tumover funds, about 2.8 percent per year.
There may have been a profit opportunity for mutual fund investors who
bought high tumover funds. However, there is not enough statistical power to
determine it with these tests. For example, if we break the sample of funds down
by investment objective and examine the pairs of extreme tumover funds, there
are no significant f-statistics in any investment objective category. As a general
rule, this is true for the other fund characteristics as well. Other than the low fee
aggressive growth funds (at a significance level that does not survive a multiple
comparisons hurdle), there are no significant f-statistics for the performance of
funds with extreme amounts of any characteristic.
The differences between the extreme portfolios in Table 8 represent an altema-
tive functional form for the relation between these characteristics and performance.
These numbers, reported in the rows labeled "difference" in Table 8, are thus re-
lated to those in Table 7. With the P8 benchmark, the differences between the
risk-adjusted retums of the extreme tumover portfolios are significant, support-
ing the findings of the cross-sectional regressions. However, the extreme tumover
portfolio test does not appear to have enough power to determine whether tumover
is an important determinant of performance for subgroups of funds with the same
investment objective.
The Table 8 tests appear to have enough power to conclude that fund charac-
teristics matter as a group, even amongst funds with the same investment objective.
The F-statistics in the rightmost column, which test whether the four portfolios
grouped by fund characteristic have zero performance, are significant for both the
aggressive growth and growth income categories. Although we observed three
investment objective categories to arrive at two significant ones, the F-statistic of
4.05 for the growth income category has a significance level of 0.004, which is
significant at the 5-percent level after adjusting for the multiple comparison with
the Bonferroni inequality.'^
'*The analysis of funds with extreme loads was not carried out because, in 1974. virtually all load
funds had the same maximum load of 8.5 percent.
'^The top 10 percent of funds, ranked by tumover, have performance that is about twice as large,
or about 1.5 percent per year.
'*We also examined f-statistics from extreme portfolios reconstructed each year using character-
istics from the previous year rather than characteristics from the 1975 edition of Wiesenberger. In
contrast to the portfolios tested in Table 8, the weights of these portfolios change over time as the
fund characteristics change. These dynamic investment strategies in mutual funds have about the same
abnormal retums as the static strategy that is based on the 1975 characteristics.
19. Grinblatt and Titman 437
TABLE 8
Jensen Measures of Equally-Weighted Portfolios of Funds for Extreme Deciles of Funds
Sorted by Four Characteristics*
Characteristic
Net
Asset Expense Mgt. Fund F-Test
Sample Criterion Value Ratio Fee Turnover (p-Value)
All Funds Top 20% -0.0004 0.0005 -0.0005 0.0007 3.61
(-0.67) (0.60) (-0.50) (0.66) (0.008)**
Bottom 20% -0.0000 -0.0014 -0.0006 -0.0011 3.82
(-0.056) (-2.38) (-0.98) (-1.92) (0.006)**
Difference -0.0004 0.0019 0.0001 0.0017 7.41
(-0.48) (2.39) (0.12) (2.01)* (0.000)**
Aggressive Growth Top 20% -0.0000 0.0029 0.0027 0.0024 1.83
(-0.01) (1.83) (1.39) (1.20) (0.128)
Bottom 20% 0.0007 -0.0005 0.0031 -0.0000 3.53
(0.47) (-0.38) (2.09)* (-0.00) (0.010)**
Difference -0.0007 0.0034 -0.0004 0.0024 2.98
(-0.50) (2.64)" (-0.34) (1.45) (0.023)*
Growth Top 20% -0.0002 -0.0013 -0.0015 -0.0000 1.48
(-0.28) (-1.38) (-1.58) (-0.01) (0.222)
Bottom 20% -0.0007 -0.0012 0.0001 -0.0008 1.44
(-0.90) (-1.69) (0.13) (-1.11) (0.225)
Difference 0.0005 -0.0001 -0.0016 0.0008 1.71
(0.56) (-0.13) (-1.73) (0.79) (0.152)
Growth Income Top 20% -0.0003 0.0004 -0.0003 0.0003 0.52
(-0.37) (0.53) (-0.31) (0.31) (0.822)
Bottom 20% -0.0009 -0.0013 -0.0009 -0.0003 1.35
(-1.17) (-1.67) (-1.19) (-0.40) (0.255)
Difference 0.0007 0.0016 0.0007 0.0006 4.05
(0.93) (2.50)* (0.72) (0.65) (0.004)**
This table presents the average monthly performance (in decimal form) and time series f-statistics for
mutual funds ranked in the top 20 percent and the bottom 20 percent in terms of net asset value, expense
ratio, management fee, and fund turnover. Results for three subsamples based on a fund's investment
objective classification are also reported. The difference between the performance of the top 20 percent
and bottom 20 percent is also reported. To calculate performance we use the Jensen Measure with the
eight characteristic benchmark (P8). The time series (-statistics are in parentheses below the abnormal
return. The time series f-statistics are the standard intercept (-statistics from a regression of the returns
of equally-weighted portfolios (or the return difference between two equaliy-weighted portfolios) against
the excess returns of the benchmark portfolio(s). In addition, this table reports time series F-tests and
associated significance levels (p-values). These test whether the performance of the four portfolios in
the row are jointly zero.
*Significantat0.05 level.
**SignificantatO.O1 level.
VII. Conclusion
This study contains three contributions to the literature on ponfolio perfor-
mance evaluation. First, it examines the sensitivity of performance inferences to
benchmark choice. Second, it compares the Jensen Measure with two new mea-
sures that were developed to overcome the timing-related biases of the Jensen
Measure. Finally, it analyzes whether fund performance is related to fund at-
tributes.
We find that the choice of a benchmark can have a large effect on inferences
about perfonnance. However, this does not mean that all results about mutual
fund perfonnance are spurious. Rather, it means that care must be taken to avoid
inefficient benchmarks. For instance, the mutual funds display strong negative per-
20. 438 Journal of Financial and Quantitative Analysis
formance, on average, with either the equally-weighted index or the factor-based
benchmark and virtually zero performance with the benchmark formed from se-
curities characteristics. This difference is most probably due to the size-related
biases of the equally-weighted and factor-based benchmarks. In our sample pe-
riod, the stocks of large tirms perfonn poorly relative to these benchmarks and,
as a result, mutual funds, which on average purchase larger than average stocks,
also perfonn poorly relative to those benchmarks. In addition to affecting average
performance, benchmarks have a large effect on how funds perform relative to
each other. In particular, the correlation between the performance numbers gen-
erated by the characteristics-based benchmark and those generated with the other
benchmarks were low.
The different measures of performance that were examined in the paper, the
Jensen Measure, the Treynor-Mazuy Total Perfonnance Measure, and the Posi-
tive Period Weighting Measure, displayed high cross-sectional correlations. This
suggests that the concems of Jensen (1972), Admati and Ross (1985), Dybvig and
Ross (1985), and Grinblatt and Titman (1989b) regarding a timing-related prob-
lem in the Jensen Measure may not be important in practice since measures that
eliminate this problem yield almost identical inferences." We believe, however,
that the measures are similar because very few funds successfully time the market.
In fact, the measures are significantly different for those funds that appear to have
successfully timed the market.
The latter part of the paper presented tests to examine the determinants of
mutual fund performance. These tests analyzed whether performance, as measured
by the only reliable benchmark, the P8 benchmark, is related to fund size, expenses,
management fee, portfolio tumover, and load. We found that performance is
positively related to portfolio tumover, but not to the size of the mutual funds or
to the expenses that the funds generate. This suggests that the funds that spend
the most on research and trade the most may, in fact, be uncovering underpriced
stocks.
These results are related to the earlier work of Grinblatt and Titman (1989a),
(1993) which found evidence of abnormal performance with the P8 benchmark
based on hypothetical retums constructed from the funds' exchange-listed equity
holdings. Because these hypothetical retums are computed without deducting fees
and transaction costs, the results do not necessarily imply that investors could gain
by buying shares in the funds. The evidence of abnormal performance found in
this paper may be more surprising since we are measuring actual retums net of
transaction costs.
Appendix A. The Positive Period Weighting Measure
The period weights used in this study can be interpreted as the marginal
utilities of an investor with power utility. Since this utility function does not exhibit
"Ttie high correlations also imply that the Positive Period Weighting Measure is robust to our spec-
ification ofthe weights. This is because the Jensen Measure and the Treynor-Mazuy Total Performance
Measure are Period Weighting Measures (without the nonnegativity constraint) that are based on utility
functions that differ substantially from the power utility function used to calculate the weights in this
study. For further discussion of this, see Grinblatt and Titman (1988). (1989b).
21. Grinblatt and Titman 439
satiation, period weights derived from it satisfy the nonnegativity constraint. A risk
aversion parameter of eight was chosen because it generated an optimal portfolio
that required almost no holdings of the risk-free asset. To calculate a set of weights
for each of the six benchmarks, we:
1) Applied an algorithm that searched for the utility optimal combination of the
portfolios in the benchmark and the risk-free asset, i.e., solved for 7, s.t.
7 = argmax{£(W)} = argmax | -17, (1 + ry, -1-
where Rn It is the excess retum of the benchmark. (In the case of multiple port-
folio benchmarks, 7 is a row vector and R,, is a column vector of period t excess
retums with element / corresponding to the /th portfolio in the benchmark).
2) Calculated the time series of gross retums of the optimal portfolio,
1 + ry, + 7^/,.
3) Interpreted the gross retums as wealth levels (i.e., WLOG set initial wealth to
one for each observation), and calculated the marginal utility of this wealth
level with the power function, i.e.,
/ - >. — 8
marginal utility at f = 7 (1 H- r/, -t- 7/?/,)
4) Rescaled the marginal utilities to be weights that sum to one,^° i.e.,
5) Computed performance as the dot product of the weight vector and the excess
retum vector of the portfolio to be evaluated, i.e.,
PW = S,w,Rp,,
where PW = Positive Period Weighting Measure.
Since the first order condition for utility maximization requires that
E{U'i)R,) = 0
for the excess retum, R/, of each portfolio in the benchmark, thefive-stepprocedure
described above derives weights that approximately satisfy the weighted excess
retum condition
E,w,Ri, = 0.
The Positive Period Weighting Measure, like the Jensen Measure, is a linear
weighting of retums. The standard f-statistic can thus be used to test whether it
significantly differs from zero, when conditioned on the excess retums of the
weights are scaled to sum to one so that observed Positive Period Weighting Measures can
be interpreted as monthly abnormal retums.
22. 440 Journal of Financial and Quantitative Analysis
portfolios in the benchmark. The test statistic, which has a r-distribution^' with
T-K-1 degrees of freedom if there are T retums and K benchmark portfolios, is
where s = std error of the excess retum regression used to compute the portfolio's
Jensen Measure.
The /-statistic formula above applies whether the test is based on the retums
of a single fund or the retums of a portfolio of funds. In Table 1, the retums of an
equally-weighted portfolio ofthe respective funds (passive funds or mutual funds)
are used to generate a f-test of whether average perfonnance across funds is zero.
Appendix B. The Treynor-Mazuy Quadratic Regression
The Treynor-Mazuy quadratic regression is
Rp = ap + PipR, + Pipk] -(- ip,
where Ri is the excess retum ofthe benchmark portfolio and Rp is the excess retum
of the portfolio being evaluated. Jensen (1972) and Admati et al. (1986), among
others, have analyzed the asymptotic properties ofthe two slope coefficients in the
regression when the portfolio strategy has linear risk adjustments to timing signals.
They noted that the second slope coefficient, which measures co-skewness with the
benchmark portfolio, is related to timing ability. In this special case, it is trivial
to prove that the contribution of timing information to the excess retum of the
portfolio is proportional to the coefficient on the quadratic term in large samples.
The Treynor-Mazuy Total Performance Measure is defined to be
TM = ap-i-/32pVar(^,),
which, again, is easily demonstrated to be the added retum from superior infor-
mation under the Admati et al. (1986) assumptions (i.e., exponential utility and
muitivariate normality).
The Treynor-Mazuy Measure, like the Jensen Measure, is a linear weighting
of retums. The standard f-statistic can thus be used to test whether it is significantly
different from zero, when conditioned on the excess retums ofthe portfolios in the
benchmark. The test statistic, which has a /-distribution with T - K - I degrees
of freedom if there are T retums and K benchmark portfolios is
TM/s(TM),
where i(TM) is the standard error of the Treynor-Mazuy total performance mea-
sure.
To compute this standard error, we:
1) Computed s{e), the standard error of the regression from the excess retum re-
gression used to compute the Jensen Measure for the portfolio being evaluated.
•^'This assumes that the benchmark portfolios add up to a point on the efficient frontier of the 109
test portfolios and that the test portfolios have normally distributed residuals.
23. Grinblatt and Titman 441
2) Computed the variance-covariance matrix of the three coefficients in the qua-
dratic regression, conditional on the benchmark excess retum.^^ This is
V = /(e)(X'X)"',
where X is the T x 3 matrix of regressors in the Treynor-Mazuy quadratic
regression.
3) Computed .r(TM) as the square root of q'Vq, where the 1 x 3 row vector
q' = (lOVar(^,))
and Var(^/) is the variance of the benchmark portfolio's excess retum. (In the
case of multiple portfolio benchmarks, we employ retums from the ex post
efficient combination of the portfolios in the benchmark.)
The /-statistic formula above applies whether the test employs the retums of a
single fund or the retums of a portfolio of funds. The ^-statistics reported in Table 1
for the passive portfolios and mutual funds use the retums of an equally-weighted
portfolio of the respective funds to detemiine whether average performance across
funds is zero.
Appendix C. Time-Series Test Statistics for Tables 3, 5, 6,
and 7
Given N funds, let
a = N X I vector of "performance" (for Table 6, the difference in per-
formance with two measures),
R = T X N matrix with the entry in row t and column n comprised of
fund n's excess retum in month t.
By assumption, the row vectors of R are i.i.d. normal conditional on the excess
retums of the portfolio(s) in the benchmark. We note that
(A-1) a = R'w,
where w is a 7 x 1 weight vector with elements that sum to one (zero for Table 6)
and depend solely on the benchmarks' excess retums and the measure used.
Let
A = N X K matrix comprised of a column of ones and ^ — 1 columns
of fund attributes, denoted ak,k= l,...,K - 1.
Mo
= KxN partitioned matrix = (A'A)"'A'.
MK-1
compute the standard error of the regression conditional only on the excess retum of the
benchmark, but do not additionally condition on the squared excess retum to permit fair comparisons
with the Positive Period Weighting and Jensen Measures.
24. 442 Journal of Financiai and Quantitative Anaiysis
(A-2) = T x l vector = = O,...,K-l,
Cjt - T X I residual vector from regressing Rpk on the benchmark port-
folio(s). ek, is its rth element.
Consider the cross-sectional regression.
K-i
a = 70 +
The coefficient
(A-3) -fk = Mka = MkR'w = RpkW, k = O,...,K-l
by Equations (A-1) and (A-2).
The K X K covariance matrix of CQ,, . . . , e/^_i,, is
V = Var
which is the same for all t by the earlier i.i.d. assumption. Vy denotes the unbiased
estimate of element ij of this matrix and ^((,7) denotes the 0' - ' + 1) x 0 - ' + 1)
matrix consisting ofthe unbiased sample estimate ofthe submatrix of V comprised
of rows and columns / through y. By Equation (A-3),
7o
Var = (w'w)V.
To test whether 74 = 0, we use the test statistic,
Ik
t{T - P - 1),
where P is the number of portfolios in the benchmark and
A joint test of whether 7* = gi, ' = g2,---, Ik+j-1 = gjj < K -k,is given
by the test statistic.
T-P-j
F=
j(T -P- l)(w'v
25. Grinbiatt and Titman 443
Under our assumptions, F has a small sample central F-distribution with j and
T — P -j degrees of freedom. The proof is a trivial extension of Gibbons, Ross,
and Shanken (1989).
To apply these results to Tables 3 and 5, which regress performance with
method c (denoted ac) against performance with method d (ad), we let the a
above be ac and let the fund characteristic, a be aj. We test whether 70 = 0 and
and 7i = 1 in the F-test, which has A' = 1 and; = 2.
In Table 6, the dependent variable, a, is the difference between two measures
using the same benchmark. One merely substitutes the difference between the two
measures for a above. The weight vector, w, is now the difference between two
weight vectors. Since the test in Table 6 is a single coefficient restriction, the r-test
described above is the one that is used.
In Table 7, the dependent variable is performance. The right side variables
in the cross-sectional regression are a vector of ones (for the intercept), five fund
characteristics, and a set of investment objective dummies. If we let a] through as
in the discussion above be the fund characteristic vectors, it is straightforward to
apply an F-test that examines whether ai = ^2 = • • • = ^5 = 0. In this test,y' = 5.
References
Admati, A.; S. Bhattacharya; P. Pfleiderer; and S. A. Ross. "On Timing and Selectivity." Journal of
Finance. 46 (July 1986). 715-730.
Admati, A., and S. A. Ross. "Measuring Investment Performance in a Rational Expectations Equilib-
rium Model." Journal of Business. 58 (Jan. 1985), 1-26.
Banz, R. "The Relationship between Return and Market Value of Common Stocks." Journal of Finan-
cial Economics. 9 (March 1981), 3-18.
Black. E; M. Jensen; and M. Scholes. "The Capital Asset Pricing Model: Some Empirical Tests." tn
Studies in the Theory of Capital Markets, M. Jensen, ed. New York, NY: Praeger (1972).
Brown, S. J.; W. Goetzmann; R. G. Ibbotson; and S. A. Ross. "Survivorship Bias in Performance
Studies." Review of Financial Studies. 5 (4, 1992), 553-580.
Dybvig, P., and S. A. Ross. "Differential Information and Performance Measurement Using A Security
Market Line." Journal of Finance. 40 (June 1985), 383-399.
Elton, E. J.; M. J. Gruber; S. Das; and M. Hlavka. "Efficiency with Costly Information: A Reinterpre-
tation of Evidence from Managed Portfolios." Review of Financial Studies. 6 ( 1 , 1993), 1-22.
Fama, E., and J. MacBeth. "Risk, Retum and Equilibrium: Empirical Tests." Journal of Political
Economy. 72 (May-June 1973), 607-636.
Gibbons, M.; S. A. Ross; and J. Shanken. "A Test ofthe Efficiency of a Given Portfolio." Econometrica,
57 (Sept. 1989), 1121-1152.
Grinblatt, M.. and S. Titman. "Mutual Fund Performance: An Analysis of Monthly Retums." Working
Paper, Univ. of Califomia, Los Angeles (March 1988).
"The Evaluation of Mutual Fund Performance: An Analysis of Quarterly
Portfolio Holdings." Journal of Business. 62 (July 1989a), 394-415.
. "Portfolio Perfomiance Evaluation: Old Issues and New Insights." Review of
Financial Studies. 2 (No. 3, 1989b), 396-422.
"The Persistence of Mutual Fund Performance." Journal of Finance. 47 (Dec.
1992), 1977-1984.
. "Performance Measurement without Benchmarks: An Examination of Mutual
Fund Retums." Journal of Business. 66 (Jan. 1993), 47-68.
Jensen, M. "The Performance of Mutual Funds in the Period 1945-1964." Journal of Finance, 23
(May 1968), 389-416.
"Risk, the Pricing of Capital Assets, and the Evaluation of Investment Portfo-
lios." Journal of Business. 42 (April 1969), 167-247.
_. "Optimal Utilization of Market Forecasts and the Evaluation of Investment
Portfolio Performance." In Mathematical Methods in Investment and Finance. Szego and Shell,
eds. Amsterdam: North Holland (1972).
26. 444 Journal of Financial and Quantitative Analysis
Lehmann, B., and D. Modest. "Mutual Fund Performance Evaluation: A Comparison of Benchmarks
and Benchmark Comparisons." Journal of Finance, 42 (June 1987), 233-265.
^ "The Empirical Foundations of the Arbitrage Pricing Theory." Journal of
Financial Economics. 21 (Sept. 1988), 213-254.
Litzenberger. R., and K. Ramaswamy. "The Effects of Personal Taxes and Dividends on Capital Asset
Prices: Theory and Empirical Evidence." Journal ofFinancial Economics.! Qnns 1979), 163-195.
"The Effects of Dividends on Common Stock Retums: Tax Effects or Infor-
mation Effects?" Journal of Finance, 37 (May 1982), 429-443.
Reinganum, M. "Misspecification of Capital Asset Pricing: Empirical Anomalies Based on Eamings'
Yields and Market Values." Journal of Financial Economics. 9 (March 1981), 19-46.
Roll, R. "Ambiguity when Performance is Measured by the Securities Market Line." Journal of
Finance. 33 (Sept. 1978), 1051-1069.
Sefcik, S., and R. Thompson. "An Approach to Statistical Inference in Cross-Sectional Models with Se-
curity Abnormal Retums as the Dependent Variable." Journal ofAccounting Research. 24 (Autumn
1986), 316-334.
Treynor, J. "How to Rate Management of Investment Funds." Harvard Business Review, 44 (Feb.
1965), 131-136.
Treynor, J., and F. Mazuy. "Can Mutual Funds Outguess the Market?" Harvard Business Review. 45
(July-Aug. 1966), 131-136.
Wiesenberger, A. Investment Companies Service. New York, NY: A. Wiesenberger and Co. (1975).