applied statistics to probability theory, operations research, optimization, and mathe- of new techniques made possible by advances in computational practice. Request PDF | On Sep 1, , Karin Bammann and others published Statistical Models: Theory and Practice. Statistical Models: Theory and Practice by DA Freedman. Instructions for the Project. 25 September Students in the course do a project. This involves.
|Language:||English, Spanish, Arabic|
|Genre:||Politics & Laws|
|ePub File Size:||21.81 MB|
|PDF File Size:||18.48 MB|
|Distribution:||Free* [*Register to download]|
Cambridge Core - Statistical Theory and Methods - Statistical Models - by David A. Freedman. Theory and Practice. Statistical Models . PDF; Export citation. Library of Congress Cataloging in Publication Data. Freedman, David, –. Statistical models: theory and practice / David A. Freedman. p. cm. Includes. Freedman D.A. Statistical Models: Theory and Practice. Файл формата pdf; размером 1,97 МБ. Добавлен пользователем Anatol
Through this collection, Freedman offers an integrated synthesis of his views on causal inference. He explores the foundations and limitations of statistical modeling and evaluates research in political science, public policy, law, and epidemiology. Freedman argues that many new technical approaches to statistical modeling constitute not progress, but regress, and he shows why these methods are not reliable. But the tide is turning. Many social scientists now agree that statistical technique cannot substitute for good research design and subject matter knowledge.
Contents ix Endogeneity in Probit Response Models The usual Heckman two-step procedure should not be used for remov- ing endogeneity bias in probit regression. From a theoretical perspective this procedure is unsatisfactory, and likelihood methods are superior. Un- fortunately, standard software packages do a poor job of maximizing the biprobit likelihood function, even if the number of covariates is small.
For instance, the hypothesis that observations are independent cannot be tested against the general alternative that they are dependent with power that exceeds the level of the test.
Thus, the basic assumptions of regression cannot be validated from data. On Types of Scientific Inquiry: The Role of Qualitative Reasoning Causal inference can be strengthened in fields ranging from epidemi- ology to political science by linking statistical analysis to qualitative knowledge. Examples from epidemiology show that substantial progress can derive from informal reasoning, qualitative insights, and the creation of novel data sets that require deep substantive understanding and a great expenditure of effort and shoe leather.
Scientific progress depends on re- futing conventional ideas if they are wrong, developing new ideas that are better, and testing the new ideas as well as the old ones. Qualitative evidence can play a key role in all three tasks. Freedman presents in this book the foundations of statis- tical models and their limitations for causal inference.
Examples, drawn from political science, public policy, law, and epidemiology, are real and important. A statistical model is a set of equations that relate observable data to underlying parameters.
The parameters are supposed to characterize the real world.
Formulating a statistical model requires assumptions that are routinely untested. Indeed, some are untestable in principle, as Freedman shows in this volume. Assumptions are involved in choosing which pa- rameters to include, the functional relationship between the data and the parameters, and how chance enters the model. It is common to assume that the data are a simple function of one or more parameters, plus random error.
Linear regression is often used to estimate those parameters. More complicated models are increasingly common, but all models are limited by the validity of the assumptions on which they ride. Modeling assumptions—rarely examined or even enunciated —fail in ways that undermine model-based causal inference. Because of their unrealistic assumptions, many new techniques constitute not prog- ress but regress.
His goal was to offer an integrated presentation of his views on applied statistics, with case studies from the social and health sciences, and to encourage discussion of those views.
The text has been lightly edited; in a few cases chapter titles have been altered. Citations to the original publications are given on the first page of each chapter and in the reference list, which has been consolidated at the end. When available, references to unpublished articles have been updated with the published versions. Many people deserve acknowledgment for their roles in bringing these ideas and this book to life, including the original co-authors and acknowledged reviewers.
We thank Janet Macher for her as- sistance in editing the manuscript. Colleagues at Berkeley and elsewhere contributed valuable suggestions. Ed Parsons of Cambridge University Press helped shape the project and moved it to press with amazing speed. Sekhon, and Philip B. Stark Drawing sound causal inferences from observational data is a cen- tral goal in social science. How is controversial.
Technical approaches based on statistical models—graphical models, non-parametric structural equation models, instrumental variable estimators, hierarchical Bayesian models, and the like—are proliferating. But David Freedman has long ar- gued that these methods are not reliable. He demonstrated repeatedly that it can be better to rely on subject matter expertise and to exploit natural variation to mitigate confounding and rule out competing explanations. When Freedman first enunciated this position decades ago, many were skeptical.
An increasing number of social scientists now agree that statistical technique cannot substitute for good research design and subject-matter knowledge. This view is particularly common among those who understand the mathematics and have on-the-ground experience.
In contrast, advocates of statistical modeling sometimes claim that their methods can salvage poor research design or low-quality data. Some suggest that their algorithms are general-purpose inference engines: Put in data, turn the crank, out come quantitative causal relationships, no knowledge of the subject required.
Modeling assumptions are made primarily for mathematical con- venience, not for verisimilitude. The assumptions can be true or false— usually false. When the assumptions are true, theorems about the methods hold.
When the assumptions are false, the theorems do not apply. How well do the methods behave then? Do they violate common sense? Freedman asked and answered these questions, again and again. Rather, they require shoe leather: careful empirical work tailored to the subject and the research question, informed both by subject-matter knowledge and statistical principles.
Witness his mature perspective: Causal inferences can be drawn from nonexperimental data. However, no mechanical rules can be laid down for the activ- ity.
Since Hume, that is almost a truism. Instead, causal in- ference seems to require an enormous investment of skill, in- telligence, and hard work. Many convergent lines of evidence must be developed. Natural variation needs to be identified and exploited. Data must be collected. Confounders need to be con- sidered. Alternative explanations have to be exhaustively tested. Before anything else, the right question needs to be framed. Naturally, there is a desire to substitute intellectual capital for labor.
That is why investigators try to base causal inference on statistical models. The technology is relatively easy to use, and promises to open a wide variety of questions to the research effort. However, the appearance of methodological rigor can be deceptive. The models themselves demand critical scrutiny. Mathematical equations are used to adjust for confounding and other sources of bias.
These equations may appear formidably precise, but they typically derive from many somewhat arbi- trary choices. Which variables to enter in the regression? What functional form to use? What assumptions to make about pa- rameters and error terms? These choices are seldom dictated either by data or prior scientific knowledge.
That is why judg- ment is so critical, the opportunity for error so large, and the number of successful applications so limited. But some scientists ignore the design and instead use regres- sion to analyze data from randomized experiments.
Chapters 12 and 13 show that the result is generally unsound. To assess how close an observational study is to an experiment re- quires hard work and subject-matter knowledge.
Even without a real or natural experiment, a scientist with sufficient expertise and field experi- ence may be able to combine case studies and other observational data to rule out possible confounders and make sound inferences. Freedman was convinced by dozens of causal inferences from ob- servational data—but not hundreds. Chapter 20 gives examples, primarily from epidemiology, and considers the implications for social science. Only shoe leather and wisdom can tell good assumptions from bad ones or rule out confounders without deliberate randomization and intervention.
These resources are scarce. Researchers who rely on observational data need qualitative and quantitative evidence, including case studies. They also need to be mind- ful of statistical principles and alert to anomalies, which can suggest sharp research questions. No single tool is best: They must find a combination suited to the particulars of the problem.
Freedman taught students—and researchers—to evaluate the quality of information and the structure of empirical arguments. He emphasized critical thinking over technical wizardry.
This focus shines through two influential textbooks. His widely acclaimed undergraduate text, Statis- tics,3 transformed statistical pedagogy. Statistical Models: Theory and Practice,4 written at the advanced undergraduate and graduate level, pre- sents standard techniques in statistical modeling and explains their short- comings. These texts illuminate the sometimes tenuous relationship be- tween statistical theory and scientific applications by taking apart serious examples.
They show when, why, and by how much statistical modeling is likely to fail. At Berkeley, we have lab sessions where students use the computer to analyze data. Some are mathematical and some are hypothetical, providing the analogs of lemmas and counter-examples in a more conventional treatment. On the other hand, many of the exercises are based on actual studies. Answers to most of the exercises are at the back of the book.
Beyond exercises and labs, students at Berkeley write papers during the semester. Instructions for projects are also available from the publisher. A text is defined in part by what it chooses to discuss, and in part by what it chooses to ignore; the topics of interest are not to be covered in one book, no matter how thick.
My objective was to explain how practitioners infer causation from association, with the bootstrap as a counterpoint to the usual asymptotics.
Examining the logic of the enterprise is crucial, and that takes time. If a favorite technique has been slighted, perhaps this reasoning will make amends. There is enough material in the book for 15—20 weeks of lectures and discussion at the undergraduate level, or 10—15 weeks at the graduate level. With undergraduates on the semester system, I cover chapters 1—7, and introduce simultaneity sections 9.
This usually takes 13 weeks. If things go quickly, I do the bootstrap chapter 8 , and the examples in chapter 9. On a quarter system with ten-week terms, I would skip the student presentations and chapters 8—9; the bivariate probit model in chapter 7 could also be dispensed with. During the last two weeks of a semester, students present their projects, or discuss them with me in office hours.
I often have a review period on the last day of class.
For a graduate course, I supplement the material with additional case studies and discussion of technique. The revised text organizes the chapters somewhat differently, which makes the teaching much easier. The exposition has been improved in a number of other ways, without I hope introducing new difficulties. There are many new examples and exercises. The students in those courses were helpful and supportive.
Regression models can be used for different purposes: i to summarize data, ii to predict the future, iii to predict the results of interventions. The third—causal inference—is the most interesting and the most slippery. It will be our focus.
For background, this section covers some basic principles of study design.
Causal inferences are made from observational studies, natural experiments, and randomized controlled experiments. When using observational non-experimental data to make causal inferences, the key problem is confounding. Sometimes this problem is handled by subdividing the study population stratification, also called cross-tabulation , and sometimes by modeling.
These strategies have various strengths and weaknesses, which need to be explored. Up to random error, the coin balances the two groups with respect to all relevant factors other than treatment. Differences between the treatment group and the control group are therefore due to treatment. That is why causation is relatively easy to infer from experimental data.
However, experiments tend to be expensive, and may be impossible for ethical or practical reasons. Then statisticians turn to observational studies.
In an observational study, it is the subjects who assign themselves to the different groups. The investigators just watch what happens. Studies on the effects of smoking, for instance, are necessarily observational. However, the treatment-control terminology is still used. The investigators compare smokers the treatment group, also called the exposed group with nonsmokers the control group to determine the effect of smoking.
Smokers come off badly in comparison with nonsmokers. Heart attacks, lung cancer, and many other diseases are more common among smokers. There is a strong association between smoking and disease. If cigarettes cause disease, that explains the association: death rates are higher for smokers because cigarettes kill. Generally, association is circumstantial evidence for causation. However, the proof is incomplete.
There may be some hidden confounding factor which makes people smoke and also makes them sick. If so, there is no point in quitting: that will not change the hidden factor. Association is not the same as causation. Confounding means a difference between the treatment and control groups—other than the treatment—which affects the response being studied. Typically, a confounder is a third variable which is associated with exposure and influences the risk of disease.
Statisticians like Joseph Berkson and R. Fisher did not believe the evidence against cigarettes, and suggested possible confounding variables. Epidemiologists including Richard Doll and Bradford Hill in England, as well as Wynder, Graham, Hammond, Horn, and Kahn in the United States ran careful observational studies to show these alternative explanations were Observational Studies and Experiments 3 not plausible.
Taken together, the studies make a powerful case that smoking causes heart attacks, lung cancer, and other diseases. If you give up smoking, you will live longer. Epidemiological studies often make comparisons separately for smaller and more homogeneous groups, assuming that within these groups, subjects have been assigned to treatment or control as if by randomization. For example, a crude comparison of death rates among smokers and nonsmokers could be misleading if smokers are disproportionately male, because men are more likely than women to have heart disease and cancer.
Gender is therefore a confounder. Age is another confounder. Older people have different smoking habits, and are more at risk for heart disease and cancer. So the comparison between smokers and nonsmokers was made separately by gender and age: for example, male smokers age 55—59 were compared to male nonsmokers in the same age group. This controls for gender and age. Air pollution would be a confounder, if air pollution causes lung cancer and smokers live in more polluted environments.
To control for this confounder, epidemiologists made comparisons separately in urban, suburban, and rural areas. In the end, explanations for health effects of smoking in terms of confounders became very, very implausible. Of course, as we control for more and more variables this way, study groups get smaller and smaller, leaving more and more room for chance effects. This is a problem with cross-tabulation as a method for dealing with confounders, and a reason for using statistical models.
Furthermore, most observational studies are less compelling than the ones on smoking. The following slightly artificial example illustrates the problem.
Example 1. In cross-national comparisons, there is a striking correlation between the number of telephone lines per capita in a country and the death rate from breast cancer in that country. This is not because talking on the telephone causes cancer. Richer countries have more phones and higher cancer rates. The probable explanation for the excess cancer risk is that women in richer countries have fewer children. Pregnancy—especially early first pregnancy—is protective.
Differences in diet and other lifestyle factors across countries may also play some role. Randomized controlled experiments minimize the problem of confounding. That is why causal inferences from randomized controlled experiments are stronger than those from observational stud- 4 Chapter 1 ies.
With observational studies of causation, you always have to worry about confounding. What were the treatment and control groups? How were they different, apart from treatment? What adjustments were made to take care of the differences? Are these adjustments sensible? The rest of this chapter will discuss examples: the HIP trial of mammography, Snow on cholera, and the causes of poverty.
If the cancer is detected early enough—before it spreads—chances of successful treatment are better. Does mammography speed up detection by enough to matter? There were about half a dozen other trials as well. By the late s, mammography had gained general acceptance. The HIP study was done in the early s. HIP was a group medical practice which had at the time some , members. Subjects in the experiment were 62, women age 40—64, members of HIP, who were randomized to treatment or control.
The control group continued to receive usual health care. Results from the first 5 years of followup are shown in table 1. Death rates per women are shown, so groups of different sizes can be compared.
Table 1. HIP data. Group sizes rounded , deaths in 5 years of followup, and death rates per women randomized. Rate 23 16 39 63 1.
Rate 21 38 27 28 Observational Studies and Experiments 5 Which rates show the efficacy of treatment? It seems natural to compare those who accepted screening to those who refused. However, this is an observational comparison, even though it occurs in the middle of an experiment. The investigators decided which subjects would be invited to screening, but it is the subjects themselves who decided whether or not to accept the invitation.
Richer and better-educated subjects were more likely to participate than those who were poorer and less well educated. Furthermore, breast cancer unlike most other diseases hits the rich harder than the poor. Social status is therefore a confounder—a factor associated with the outcome and with the decision to accept screening. The tip-off is the death rate from other causes not breast cancer in the last column of table 1. There is a big difference between those who accept screening and those who refuse.
The refusers have almost double the risk of those who accept. There must be other differences between those who accept screening and those who refuse, in order to account for the doubling in the risk of death from other causes—because screening has no effect on the risk. One major difference is social status. It is the richer women who come in for screening. Richer women are less vulnerable to other diseases but more vulnerable to breast cancer. So the comparison of those who accept screening with those who refuse is biased, and the bias is against screening.