The Problem of Accuracy of Economic Data
For the economic historian in the Austrian tradition, writes Philipp Bagus, the quality of economic data is of utmost importance, since false data or belief in inaccurate data can lead the economic historian to faulty interpretations of the past. The quality of economic data is at least as important for economists who adhere to positivism in economics, since they use economic data to confirm or falsify their models. FULL ARTICLE


Comments (18)
Article in Tuesdays WSJ discussed the changing of the CPI out to three decimal places. The author deiscussed in detail the problem when 0.2% was a result of the number being 0.249 vs 0.251 and how when grown into an annual number this makes a big difference.
How often does a manager state, "these are the numbers we have, so we just have to use them"
Another mental game with numerical error is that the error can work in both directions cancelling each other out. You just never know when that is happening. I love how these 3 decimal place numbers are really just a way of CYA when the decision makers don't want to be held accountible. Then again who can criticize them when most folks love to have someone to blame regardless of effort and reasoning
Ed
Published: August 17, 2006 9:22 AM
I am inspired that an economics student in Germany has taken an interest in Austrian economics and the works of Ludwig von Mises. The influence of Mises is urgently needed in economies across Europe. Good article Philipp!
Published: August 17, 2006 10:16 AM
There are two main sources of economic information in free markets.
The first is as a byproduct of agency, i.e. financial reporting of companies to their shareholders. This information is generally of good quality and is capable of aggregation, and is in fact aggregated in financial capital markets. For example suppose that the microchip manufacturing industry was generally carried on by listed companies, and that listed companies are generally owned by managed funds. Each microchip manufacturer prepares regular financial reports for its shareholders, and fund managers employ analysts who analyse the market and the companies in it, and in doing so they will aggregate the information from the individual companies to give the data for the industry as a whole. And of course this would apply for all production done mainly by listed companies, because managed funds strongly seek diversification and ultimately get involved in aggregating data on all industries that are significantly undertaken by listed companies.
The second source of economic information in a free market comes from market intelligence sought by traders (whether listed companies or otherwise). Obviously this includes economic information that does not go through the financial reporting process. Like fund managers, traders employ analysts to collect and aggregate market data, such as prices and quantities, for analysis. For example banks and other lenders who lend to the household sector really need to have a good understanding of each household borrower's financial situation, to manage credit risk on loans to households. As a result of this, data on households can be aggregated, even though households financial information is private and confidential. Another example of this is where goods or financial instruments are traded in markets organised as a service, for example stock exchanges. In addition to facilitating the purchase and sale of goods and instruments, such markets also aggregate and report prices and quantities for various categories of goods and instruments.
Economic information is expensive to report, validate and analyse. If such information is not worth someone's while to pay the expense of reporting, validating and analysing, it is better left unknown.
Published: August 17, 2006 10:21 AM
Interesting article! We should all keep the error problem in mind when discussing econ stats.
I have a problem with the example equation x-y=0. Everyone should recall from basic algebra that an equation with two variables requires two equations to solve. So it’s not true that x - 1.00001y = 0 have the solution x = 100001, y = 100000. That’s one possible solution. There are an infinite number of other solutions. As y increases, so must x in order to cancel each other out.
A couple of statistical techniques have been developed to help us reduce the problem of measurement error. Factor analysis and partial least squares (PLS) have been used by sociologists and psychologists for many years. Both groups assume a great deal of error in measurement when surveying people, their primary measurement tool. So they use several different questions to measure the same variable, then use factor analysis or PLS to analyze the data. These two techniques combine the varied responses to the similar questions into one response that reduces the measurement error. Economists don’t seem to like these techniques, but I think they would dramatically improve the accuracy of models if they used them. For example, they could use several different measures of prices, such as prices of consumer goods, gold, money supply and assets, and combine them via factor analysis to arrive at a more accurate index of price changes.
Finally, we shouldn’t develop a fetish over data accuracy. We should view data as a guide to decision making, not as an end in itself. I’ve worked for several organizations that obsessed over the holy grail of data perfection, but never used the data for decision making. Data perfection is impossible in the real world. But imperfect data can still be a valuable guide because the accuracy of the data isn’t as important as the relationships between variables, such as that between price and demand. Of course, if the data has enough error, we can’t establish such relationships. The real problem with data analysis happens when we constantly change the data collection techniques. Then we have data that isn’t comparable from one time period to another.
Published: August 17, 2006 10:28 AM
A very good article. Thank you.
A note to Mr. Hillary - banks today are not terribly concerned with the economic situation of its borrowers. It asks, and fudges, the answers on many occasions. The real questions about lending are answered by the central bank - whther it is inflating the market or tightening the money supply.
Published: August 17, 2006 10:32 AM
This reminded me of a saying my boss many years back used to say, when our bank was going through a ruthlessly quantitative phase ( right down to 'Measuring the effectiveness of training' in currency, no less!):
' If you can't measure it, you can't manage it' he used to say at least 3 times a week. .
And I used to mutter to myself 'so the stuff we haven't a hope of measuring properly doesn't get managed at all, does it?'
With the intervening years, and having since developed an appreciation of Austrian economic thinking, and particularly its approach to policy ( viz. have none), I realise he was right, but not in the narrow way he meant it.
Let me restate his inadvertantly wise point now on behalf of your institute, and for the benefit of all politicians everywhere:
'If you can't measure it, you shouldn't even try to manage it'.
Published: August 17, 2006 10:38 AM
Good article. And since Morgenstern's book was published in 1963, and people are still taking great significance over price changes and growth rates to two decimal places, that doesn't say a lot for economics education does it?
Roger M: Morgenstern's example did involve the solution of 2 equations in each case.
Published: August 17, 2006 10:40 AM
Good article by Mr Bagus !
Though this particular subject is well-known by economists since at least from 1940s ( look about Cowles Foundation papers, JASA articles, AMS articles etc ) its generally believed that not much can be done about this. Only way is to improve methodology and sampling techniques... Trygve Haavelmo's classic ( 1944, Econometrica ) Probability Approach in Econometrics contains much usefull and illuminating discussion about these issues.
Errors in the variables usually causes biased estimates, but the sign of the bias is generally unknown. There are some techniques which theoretically can overcome this problem; Confluence Analysis by Frisch ( not teached anymore ) and Instrumental variables estimation by Geary, Haavelmo, Reiersol and Sargan.
My internship in Finnish Statistical Centre last summer showed me that the creation of national accounts is actually quite arbitrary business and much discretion is used. Provisional estimates can differ much from the final ones and a change in the methodology can make big level shifts and even changes in the trend.
Final thing is that not only data can be suspect its also methods of estimation and data analysis which must be also critically evaluated. It might be true that some so-called business cycles extracted from the financial time series can be spurious; Moving Average Filtering can cause Yule-Slutsky effects so that artificial cycles are generated. Time series analysis of economic time series is tricky business in which a practisioner should also have a knowledge about different types of estimation techiques. For example; I used both time domain ( ACF & PACF graphs ) and frequency domain ( spectral distribution function & periodogram graphs ) methods in analyzing effects of changes in inventories.
Published: August 17, 2006 10:54 AM
A Note;
In Morgensterns example the difference between solutions wasnt measurement error but the collinearity of these linear equations. Determinant of matrix of coeffiecients was actually near zero so that every numerical solution to these equations is unstable.
Published: August 17, 2006 11:06 AM
But the point of the article is that regardless of why inaccurate economic estimates occur, they do occur and there is little or no attempt to give confidence intervals with regard to such estimates that are presented to government or the public. Quite the contrary, representations of false numbers of significant digits in economic data are made every day. The economics profession does not do a very good job (in fact it does no job at all) in informing people as to the level of accuracy in economic data. Gosh, it's almost as if it's in the economics industry's interest to keep quiet about it.
Published: August 17, 2006 1:05 PM
Economic analyses are FULL of "error terms," usually signified by lower-case epsilons. These error terms might look to the uninitiated like something related to what this article is about.
Of course, they aren't - they are merely "bit buckets" in which is collected the various amounts by which estimates produced by models happen to differ from "real" data.
As noted, errors do NOT necessarily cancel each other out. In virtually all data-collection situations, there are systemic biases that can offset all of the reported data in the same direction from whatever the truth is. My favorite (as yet unsubstantiated) is reports of hours worked in the US from 1933 to 1939 (the years the National Industrial Recovery Act was in effect). I believe these numbers are systematically understated - by how much, we'll never know.
Published: August 17, 2006 2:46 PM
The author and some posters suggest an error term so that we can know the accuracy of the data, but error terms exist only in statistical analyses, not in the collection of data. Error terms tell the analyst how far the statistical analyis is from the actual data. But when collecting pricing data for CPI, for example, you have nothing to compare the collected data against. So you can't have an error term. The same is true for all of the data collected by the feds.
Since we suspect that measurement error exists in all of the data because of faulty measurement techniques and assumptions, we should use factor analysis and PLS to deal with it. For example, if you collected pricing data on consumer items, assets, gold, bonds and other things sensitive to inflation, and extracted the information that is common to all of the input variables into a single factor, you could be confident that the resulting factor would be an accurate reflection of prices. And you would have a measure of error, that is, the information that was not common to all factors. But I've never seen anyone attempt it.
Published: August 17, 2006 4:02 PM
Regarding data accuracy and decisions;
I work as an industrial hygienist. Part of that work involves evaluating potential chemical hazard risks, using data of variable quality. I can plug this data into formulas and even crude models to see whether an employee has a risk of exposure above a given level.
But unless the data is of uniform, high quality, it is only slightly better than a crapshoot, so I must follow my assessment with some kind of empirical observation (monitoring, sampling, whatever) - still imperfect, still not definitive, but at least a test of assumptions, and data that can be used to make subsequent decisions.
But if the empirical observations are incompletely made, subsequent determinations can be off by orders of magnitude - worse than worthless, because they instill false confidence.
I don't have to tell anybody reading this about economic data instilling false confidence.
Published: August 17, 2006 4:45 PM
I think this criticism is somewhat misplaced - the problem is not in accuracy (and it remains to be seen if the solutions to the models are unstable or chaoitc), but rather in relevancy.
They measure some stuff, without even vague idea of what this stuff means. CPI? GDP? What do they measure? How much money government printed? Change in wealth? (In which direction?)
There is an old Russian anecdote which illustrates the problem:
Cockpit. The pilot says: "Navigator, what the instrument reads?" - "Six, sir!" - "Six what?" - "What instrument?"
Published: August 18, 2006 4:02 AM
A little clarification is in order;
Suppose econometrician wants to estimate an relationship like y=a+b*x+e, where e is error term with standard assumptions about it valid and y is measurable "dependent" variable. Suppose that the "independent" variable is actually poorly measured or measured with a bias so that x=z+v, where v is some error of measurement/bias. Actually z is variable whose influence on y variable econometrician wants to measure. From the standard theory of least squares we know that b coefficient is Cov(y,z+v)/Var(z+v) which has several unknown components. Problem is actually like omitted variable bias, but generally like I wrote earlier we dont know the direction or magnitude of the bias.
Some estimation techniques can be theoretically used to overcome this problem; IV estimation is probably most common method of estimation in that case.
As sampling & estimation techniques and methodology of constructing these economic time series improves it is theoretically possible that the problem of data quality can become much less serious.
Roger M wrote;
"error terms exist only in statistical analyses, not in the collection of data. Error terms tell the analyst how far the statistical analyis is from the actual data. But when collecting pricing data for CPI, for example, you have nothing to compare the collected data against. So you can't have an error term. The same is true for all of the data collected by the feds"
Theoretically you could always resample the whole data and compare the results in different occasions but in reality this is impossible.
Averros wrote;
"I think this criticism is somewhat misplaced - the problem is not in accuracy (and it remains to be seen if the solutions to the models are unstable or chaoitc), but rather in relevancy."
This is perhaps the main point of criticism that the Austrians could always do more. Orthodox economists of course might always resort to tactics like; "We dont have any other options except to use this data since there are no alternatives readily available".
My own opinion is that we could do more by the method of criticism of the concepts used in modern economic statistics and economics. Theoretical background of Austrian School should be superior to the alternatives.
Published: August 18, 2006 7:32 AM
Here's a website that some may find interesting:
http://www.shadowstats.com
John Williams' Shadow Government Statistics is a monthly electronic newsletter that exposes and analyzes the flaws in current U.S. government data and reporting, as well as in certain private-sector numbers..
Published: August 18, 2006 9:19 PM
This is perhaps the main point of criticism that the Austrians could always do more. Orthodox economists of course might always resort to tactics like; "We dont have any other options except to use this data since there are no alternatives readily available".
Yep, and I think it'd worth pointing out that their position is awfully like another anecdote - about the drunk who searched the keys he lost elsewhere under the lamppost. :)
Nah, honestly saying "we don't know" is better than pretense of knowledge when all there is only some meaningless figures. But this honesty, of course, won't get one a tenure.
Published: August 19, 2006 10:41 PM
Much of this information is obtained by the Census Bureau by asking companies to fill out the info and sending it in. There are few or no checks on the accuracy of the data. These forms have no tax consequences or real penalty for false data. Many of my compatriots who file these information requests spend very little to no time filling them out, therefore. So aggregating bad information into large summaries of bad information gives you ... bad information!
Published: August 21, 2006 1:02 PM