A long, long time ago, in a state not too far away, I studied statistics and survey research methodology. This was when I was at Duke University’s public policy program (1985-1987) and my professor was John McConahay. We students spent an excruciating semester on statistics, running programs in the dark of night in the dark ages of computers, when a stray comma in a formula kept us up all night in the computer lab. Then I spent a year with him on survey research methodology (including one double-blind test of two beers—there’s nothing like learning by doing!). In this two-semester course I learned lessons that served me well when, in 1996, I helped run a national deliberative opinion poll. We had one of the most respected organizations in the country running the survey part (the University of Chicago’s National Opinion Research Center) as well as counsel from leading figures in the field. For the few years that I worked with Jim Fishkin on deliberative polling, I came to appreciate deeply how to get the best take one could on, to borrow a phrase from Derrida, the shadow of the phantom that is public opinion.
I also came to feel a sense of horror when bad surveys are taken as good, authoritative, and meaningful ones. Some of these are what survey researchers call SLOPs, self-selected listener opinion polls. These are worse than sloppy; they trot out results that look real and generalizable because they are quantitative; but in fact only say what those who bothered to answer the question think. (Or if the questions are leading, then not even that.) What this unrepresentative group thinks can absolutely NOT be taken to represent what the whole thinks.
Basic lessons I learned from studying at Duke and working on the deliberative poll include these:
- The best way to know what a group thinks is to survey everyone in the group.
- But if you can’t do that, you many need to survey a sample of the whole. But for this sample to represent the whole group it needs to mirror the whole
- The best way for a sample to mirror the larger population is for it to be generated randomly, certainly not by quotas. Market researchers often use quotas, e.g. x number of blacks, whites, Hispanics, women, men, dogs, whatever. But quotas of obvious things like these miss less obvious distinctions that might skew the results.
- Getting a random sample of a whole is tricky and requires much careful effort.
- Usually the breaking point size of a sample, no matter how large the population, is 300. A population of 3 million can be sampled nearly as well by 300 as can be a population of 30,000. Bigger is better, meaning less margin of error. But the quality of sample sizes under 300 for large populations degrades quickly.
- The order and wording of questions is crucial.
The above criteria are important for any attempts to generalize from a sample to a larger population. They are especially important in reputational rankings. If the sample is skewed at all at the beginning, if the sample is generated from a set of assumptions of “what counts as good,” then the outcome is bound to be generated from these initial assumptions. It will produce “what is good” based upon what it thought was good in the first place. And its only defense for the first place was what it wanted to prove in the end. This is the worst kind of circular thinking and may be invisible to even the most educated university administrators.
In these post-metaphysical times, when we want to assess, say, graduate programs in any of the liberal arts, there are a number of ways to proceed. Like the recent National Research Council’s survey, we could get a sense of reputation, productivity, graduate student success, etc. (For all its flaws, this was a valiant effort.) The latter are fairly (but not entirely) objective matters; but the former is thoroughly subjective—but very meaningful since really it is “the tribunal of public opinion” that matters at the end of the day, in philosophy as well as democracy. There are no external standards that tell us which arguments are most compelling. At the end of the day, what matters is whether or not we find it compelling. So who is doing “the best” work on Heidegger today? To find out, I would consult as many people as I could who work on Heidegger today. Then I might have something like the pragmatic (and provisional) truth of the matter.
To approximate this kind of truth, the NRC asked the chair of every single graduate program in the country to give their views on what were the best programs and faculty. They did not go to “the top” schools’ chairs and ask what they thought, because of course this would beg the question of which schools were top ones. (And of course this is the fatal flaw of the Leiter reports.) They asked everyone, or at least every chair, which strikes me as a fairly (though not completely) representative way to get a picture of the whole.
If the American Philosophical Association were to do its own survey of graduate programs in the United States, it would be best if it surveyed every single one of its members and asked them to indicate which programs and faculty are doing important work and providing good graduate education in their own fields. But more importantly it should make public, in one place, placement rates, student support, faculty CVs, as well as any other information that would help those interested know what the strengths are of all the programs in the country. This would be a real service to the profession.