Log in

No account? Create an account

Previous Entry | Next Entry

Gaussian anthropology

Either everything is deeply interrelated or there are ideas whose time has come or, perhaps, I see patterns where there aren't any. I say this because whenever I read two or three interesting things in a row, no matter how diverse, I see ways in which they are saying the same thing. Perhaps I have a gift for synthesis.

So I recently finished Taleb's book The Black Swan, which is basically about the gross misuse of Gaussian distributions to predict a lot of (important) things that do not, in fact, behave on anything like a Gaussian curve. It's good stuff, an easy read (unlike myself, he has an editor that doesn't mind "cahatty"), and it resonates. A lot of it is obviously right and a lot of the criticism I've read of it hinges on the fact that it can't supply a replacement for the Guassian. As though being wrong but having something to do was better than acknowledging rates of error. I hear the cry, "But we have to do SOMETHING," often enough at work that I can commiserate. Of course we have to do something. We have to acknowledge and measure rates of error rather than assume they follow a curve that WE KNOW THEY DON'T.

Anyway, there's an underlying idea in there that runs even deeper than Taleb hints at. The idea that things "average out". Wedging data into Gaussian curve is one way of assuming this, but apparently in the field of anthropology it's been happening (and is being addressed) as well.

I recently ran into an old friend who I haven't seen for some 30 years. He's an anthropologist (or at least he lectures on it). We chatted and of course I overshared and even got grabby about his area of expertise. He was researching "agency" in anthropology and so I asked for some papers so I could understand what it was about.

Now, keep in mind that I have very little background study in anthropology, so my cursory analysis is probably way off (I certainly had trouble with many "terms of art" in the papers I read and occasionally made the almost certainly wrong assumption that they meant something like the plain English they seemed to. This doesn't work in any other field, so I don't know what I was thinking). Anyway, agency is the idea that a culture does not follow a trajectory that is defined by a vector for progress or even maintenance. That is, if cultures develop to "improve", we expect to see universal adoption of "better" tools when they arrive. We don't actually see that. While there is an eventual adoption of many, there is also a lot of clinging to inferior tools (I recently fought the emacs versus vi fight at work and had to deal with a whippersnapper who was hot on some new GUI thing, so I know how this works) for apragmatic reasons.

So the theory of agency says that you have to treat a culture as an aggregate of free-willed individuals who do not necessarily make pragmatic choices on average. They preserve suboptimal methods and tools for personal or political reasons. They like the texture of coarser corn meal because it's what they ate when they were kids. The process of chipping stone tools remains a skill that is revered and that reverence is sustained by them long after iron tools are available. And so on. Individual agency, then, impacts cultural development.

So here's the synthesis.

Gaussian curve users have as their core assumption that things average out. Yes there are outlying data points, but in general they cancel out and are rare and so things tend towards the mean. It turns out that's just not true about almost everything. It's true about average height. It's not true about average salaries.

Old school (non-agency thinking) anthropoligists seem to similarly assume that the actions of individuals average out culturally. That the whims of agency are outliers on a curve of change that is essentially pragmatic.

For data like salaries (or stock fluctuation, or coastline length, or deaths in warfare, or whatever), the curve is essentially fractal. That is, the variation at a fine scale is self-affine (self-similar if you must, but actually self-affine) with the variation at larger scales. The bottom 50% of salaries follow a curve that looks remarkably similar to the top 1%, shifted down.

And so it seems that anthropology is starting to discover that agency is self-affine with culture. That cultures reflect not the mean of all individual free-willed choice, but rather the whole spectrum of preference. The vagaries of personal preference and memory and politics and love are in fact reflected in the gross details of culture rather than submerged in a torrent of dissimilar data. On reflection this even seems obvious -- what can a culture BE except a novel expression of the finer details of its members? If it were some grey mean of behaviour, the similarity between cultures would surely be much greater than it is.

I'll close by again acknowledging that I may have found this correlation by virtue of failing to understand either idea sufficiently. But the concept of self-affine behaviour over differences in scale seems a natural result of the mathematics of non-Gaussian variation and certainly the impositions of free will cannot be considered Gaussian as they lack natural limits. They are necessarily, as per Taleb, from Extremistan.

How about that, huh? Gauss, fractals, and anthropology, and I didn't once mention memes (oops). Next time.


Feb. 22nd, 2009 05:53 pm (UTC)
Re: Frac'n fractals
I'm glad I got the salient points out of the material, though we've been wielding the word "agency" around the game design/theory world for a while and the intention is similar.

Re: Bi-modal Gaussian distribution and salaries. These refinements of normal distribution fit more of the data but don't fit full range. The probabilities at the extremes are much higher in the normal models than we actually see, and so we tend to throw out the ones that are 6-sigma as "outliers" -- an example of the model eating its own tail and proving itself by discarding data that the model itself says is too unlikely.

I don't know what you actually do to model it though. Certainly some variation of a power law is more correct, but the unpredictability of many things is too extreme to know if you've got it pinned down -- the past doesn't give you enough information about the future when you're in the data set. The fact that the sun comes every day leads you to believe reasonably that it will tomorrow and every day but when you put yourself in the data set you would be tempted to use the same information to conclude your immortality.

As for books, I only think coherently for a couple thousand words tops, then I'm done. A collection of my wacky and wholely uncited essays maybe.
Feb. 22nd, 2009 06:22 pm (UTC)
Re: Frac'n fractals
'Outliers'. Ok, now you've touched a nerve. There is a lot of misundertanding with respect to why outliers are discarded from an analysis. If a value falls far from the expected range of values it is typically assumed to be due to one of two things - either it is an error or it is something that is very different from the things that were being studied. In the latter case discarding the outlier is not a refusal to acknowledge that it exists, but rather, recognition that it requires reclassification. In your 'salaries' example, in reality there are likely to be four or five modes, representing 1. working joes, 2. urban professionals, 3. coporate directors/hockey players/drug dealers, 4. corporate owners, and 5. oil/computer tycoons and royalty. If you're gathering data on salaries of urban professionals in Vancouver and you found one individual who was 6 sigma above the mean, then you would be wise to conclude that that that individual should not be classified as an urban professional.

So, I don't think the existence of outliers is a flaw of the Gaussian model, but to the contrary, it is one of the more valuable aspects of it (as I frequently tell my students - "the outliers are often the most interesting"). But, how this overall pattern came into being may be more comprensively explained through fractal geometry, which as I understand it is the product of 'chaos'.
Feb. 22nd, 2009 07:04 pm (UTC)
Re: Frac'n fractals
That all makes good sense to me, and certainly being able to change your model to account for outliers is valuable academically, but there's still a tail-eating problem there -- the assumption that the Gaussian model still holds, but that you just have a new category with a new curve. I guess in the end you're necessarily approximating what's really a massively multi-variate and dynamic (and therefore fundamentally intractable) function.

The biggest problem I encounter with Gaussian modeling, though, is not in academia where revising the model is not just an option but potentially a publication. Rather in industry where it is just not acceptable to question the model, and outliers are discarded as necessarily part of bad data collection or some other non-systemic feature. Far better, as you say, to identify it as something certainly interesting and then focus on it.

However, even given what you've said, you have to consider the possibility that an outlier indicates a broken statistical model -- choosing only between error and "out of scope" misses significant alternatives and assumes the Gaussian fitting rather than testing it. Especially if you keep finding new categories.
Feb. 22nd, 2009 07:33 pm (UTC)
Re: Frac'n fractals
Yes, I have encountered situations where someone in 'industry' has tried to use a Gaussian model to discredit the results of my work. Specially they took issue with the fact that a two week survey I conducted resulted in the discovery of more archaeological sites than 7 years of previous survey, by their researchers, had yielded. But I don't think the model is inherently flawed, but rather many people's use and understanding of it. I find it a great tool for exploring data and identifying things of interest. But having said that, it is just a model and by nature is an simplification of reality. Its focus on central tendency limits its utility in modeling. This is recognized by many statisticians (hence the development of Bayesian statistics and Chaos theory)


Brad J. Murray

Latest Month

October 2009
Powered by LiveJournal.com
Designed by Tiffany Chow