Big Data Will Blind You

Not all of us are scientists, but all of us today are consumers of science. And I mean science, not technology. When we want to lose weight, or make more money, or find that perfect someone, we don’t go to gurus, and we don’t go with our guts. We look at the latest studies.

Zemanta Related Posts ThumbnailIt’s been said that Generation X has a deep need for data. Certainly a lot of people my age long ago lost our last vestiges of idealism, and are most interested in knowing, as pragmatically as possible, exactly what works and what doesn’t. We no longer believe in Dr. Spock’s intuitions or Oprah’s platitudes. We want to see what science says. We’re only interested in practical, proven methods. We haven’t given up trying to explain the world, but we’ve stopped trying to make beautiful, abstract theories workable. In the same vein, companies like Amazon, Google and Facebook are proud to call themselves ‘data-driven’: they make no claim to being led by ‘visionaries’, but act based on rigorous analysis of consumer activity. (Of course, there are a minority of companies, such as Apple, which do claim to be led by visionaries, but these are the exception, and their stock prices are more volatile.)

Part of this zeitgeist is the modern tech industry excitement about the possibilities of ‘Big Data’, a rapidly-emerging state in which we’ll have so much data on so many people and so many financial transactions that we’ll cross some kind of singularity into perfect knowledge, a threshold beyond which we’ll find new markets, new products, and vast new vistas of profit.

Maybe so. But there’s a big pitfall that comes with Big Data. If you’re given a big pile of facts, you start to imagine that you know more than you did before; that you can just crunch some equations and run some statistics, and the numbers will tell you what to do. You’re tempted to believe that you don’t need to get the ‘how’ and ‘why’ of things, as long as you have enough ‘what’.

A little knowledge is a dangerous thing. But knowledge without understanding is even more dangerous. Here’s some examples of why.

Object Lesson: Ejectives and Altitude

It was recently discovered that languages spoken at high alititudes are more likely to have ejectives (a type of consonant which is spoken with a certain forcefulness of air pressure). This isn’t a hard and fast correlation, but it’s strongly statistically significant. Why should this be?

The author of the paper, an anthropologist at the University of Miami, suggests that it’s because of the thin air at high altitudes. It’s claimed that ejective consonants are easier to hear in low pressure areas, and the closure of the glottis during pronunciation assists the speaker in remaining hydrated.

Are you suspicious of this conclusion? You should be. The author has noticed a strong correlation, and taken a record-breaking high-flying leap to a conclusion. He has not gone out and tested hydration levels of various speakers of these languages, nor checked out how well ejectives can be heard versus other sounds.

In fact, ejectives are slightly easier to hear than non-ejectives, but they’re not the easiest consonants to hear. By far the most audible consonants to hear are sibilants, like ‘s’. (You can’t whisper an ‘s’.) Why don’t these languages have more sibilants? As for preventing dehydration, you lose most moisture when you’re pronouncing vowels, and your mouth is wide open; so you’d expect fewer vowels, not more ejectives. After all, when you speak, vowels make up about 80-90% of the length of a word.

Nor has he checked to see if there are other correlations of linguistic features with altitude. Turns out there are! High-altitude languages also tend to have objects before verbs in their sentences, and there is also a relationship between the order of verbs and objects and the order of nouns and adjectives. What are we to make of this, then? Does high altitude encourage some kinds of syntax, perhaps because of its effect on brain oxygenation? Perhaps air-starved brains are more likely to push their verbs to the ends of sentences. Or maybe the speakers of these languages rush to get the all-important predicate nouns out of their mouths before they run out of breath.

So… Many… Correlations…

That’s nonsense, of course. But in this situation, and many others, people are inclined to think that correlation must equal causation. For example, recently researchers at UPenn found (among many other fascinating things) that people who talk about sports on facebook are less likely to be neurotic. The researchers then go on to speculate that maybe playing sports helps with depression, or something like that. Well, certainly other (more careful) scientists have shown that physical activity helps with depression. But I notice that the methodology of the UPenn study makes no distinction between playing sports and watching sports. Personally, given the choice between neuroticism and watching football, I’ll take my chances with the neuroticism. Better the devil you know… But again, correlation does NOT mean causation.

So if there’s no causation involved — if high altitude doesn’t necessarily cause ejectives, and watching sports doesn’t necessarily make you happy — what’s really going on? What’s causing the correlation? Well, as far as the ejectives go, Mark Liberman at language log points out that there are hundreds of linguistic features, and thousands of languages; and in a data-rich environment like that, just by chance, there’s bound to be some correlations that don’t have any causal link at all. To understand this intuitively, suppose there are a dozen children on a playground, of which six are girls, and all the girls are in the sandbox. In this case, you might be justified in thinking that boys are avoiding the sandbox for some reason. But if instead there are a hundred children, of which three are wearing black shoes, and two of those are in the sandbox, there’s less likely a causative relationship between black shoes and sandboxes. Come back in ten minutes and maybe just the three kids in red shirts will be in the sandbox. There’s just too many variations of clothing, and too large a sample set, to draw any conclusions.

Another example was one I discussed in my Toxic Society post. Crime rates in the United States have been dropping precipitously, and up till recently no one really knew why. In the past, drops in crime have been associated with good economic times and higher rates of incarceration, so it’s been assumed that poor economies and empty prisons leads to more crime. But as the US economy has struggled through the Great Recession, crime rates have continued to plummet — not just here, but all over the world, regardless of incarceration rate. Another apparent correlation / causation link is broken.

So data can fool you into thinking you know more than you do. Even worse, you can use it to bolster ideas you’re already inclined to believe. But even worse than either of these: data can keep you from digging further to find the real causes of what’s going on.

Assume You’re Blinded

It turns out that the drop in crime rates comes not from the economy or the police work, but from environmental regulation 20 years earlier. These regulations lowered the incidence of lead in children’s brains, making them better at impulse control when they got old enough to be tempted to commit crimes. This would never have been discovered if economist Rick Nevin hadn’t followed a hunch that something was wrong with the conventional ‘data-driven’ wisdom, and undertaken a massive project to uncover the truth. He didn’t find this by looking at huge amounts of data, but by going back and questioning his assumptions.

Let’s look back at the high-altitude ejectives, and try to peel off our cultural blinders. Ejectives are found in about 15% of the world’s languages, but it so happens that none of those languages are English, Spanish, Arabic, or any other widespread language of a culture that is or was an imperialist or colonialist power. Imperialist powers tend to take over lowland areas, since they’re easily accessible from water (i.e. easier to reach with your gunboats), and generally support larger populations, are richer agriculturally, and so on. Therefore, one would expect to find languages with ejectives located in high elevations, deserts, and other relatively resource-poor and inaccessible areas.

If I’m right, then you could pick just about any linguistic feature that appears with relatively low frequency (such as object-first sentential structure, or ergative constructions) and find exactly the same geographic distribution. Object-first structure, for example, is found almost exclusively in the foothills of the Andes mountains, deep in the Amazon rainforest. Ergative languages are found in the Basque country (mountainous), the Caucasus mountains, southwestern Iran (mountainous), the mountainous Pacific Northwest, mountainous Central America and the northern Andes mountains, the largely mountainous Arctic, the mixed desert-and-mountains of the Australian outback, and Tibet. (Note that, ironically enough, there are no ejective languages in Tibet; it’s the largest exception to the ejective/elevation correlation.)

I think it would be very hard indeed to make a convincing case that sentential structure or ergativity is ’caused’ by geographic features like elevation. Of course, no doubt somebody could come up with something plausible, because cultural biases are extremely strong.

All that said: I do think geography has an effect on linguistic sounds, but very indirectly, in more subtle ways. I think generally the path leads through culture. Geography has all kinds of effects on culture, and culture has effects on language. For example, English has (for the most part) a simpler set of consonant sounds and clusters than other Germanic languages, and it definitely has a much simpler syntax and morphology. This is because England was, for over a thousand years, subject to waves of invasions by people speaking various dialects of Germanic, and what you ended up with was sort of the simplest common denominator of them all. And England was subject to these invasions because it was an easily-accessible, poorly-defended island, wealthy in land and natural resources like lumber and tin.

(Even more subtly, I think the spiritual nature of the land has an effect on the spiritual nature of the language. But this is something I feel — I don’t really have any data, big or otherwise, to back that up…)

Seeing Past the Data

So why didn’t the anthropology professor, the linguists, or the statisticians see the link between ejectives and our imperialist history? Because they were blinded by their own cultural assumptions. They simply assumed that linguistic features were scattered randomly among the languages of the world. They didn’t stop to remember that the world’s languages were part of cultures — cultures influenced by hundreds of years of imperialism, of which they are the beneficiaries. I’m not accusing anyone of prejudice. But as George Orwell said, to see what is in front of one’s nose needs a constant struggle.

Nevin arrived at the connection between crime and lead not by looking at data, but by questioning basic economic assumptions (that environmental regulation has nothing to do with crime). I came to the connection between ejectives and imperialism by questioning common cultural assumptions. These assumptions are easy to fall into if you don’t know your history. And Big Data isn’t going to save you from that. It’ll be just another tooth on the old saw: lies, damned lies, statistics… and Big Data.

The bottom line is that, as essential as data is, it does not answer any question by itself. Whether in linguistics, business, science, or our own lives, the raw data of our experience has to be analyzed for patterns; and we’ll never see those patterns unless we have unblinkered our eyes.

Zemanta Related Posts Thumbnail

Integrating Work and Spirit

For many years, I kept my spiritual life (Druidry) separated from my work (computational linguistics). Of course, there are certainly strong overlaps — you only have to look at the 50+ articles under ‘Word and Spirit’ in the sidebar to see that. And every once in awhile I’d cast a spell for prosperity or something similar. And the people at work sometimes good-naturedly joke about how Druids dance naked around Stonehenge. Ha ha! Never heard that one before. But for the most part my professional life has been secular, and my religious life non-professional.

interviewfrankmaceowenI think most people create this kind of separation, and it’s probably not healthy for us. It wasn’t really ever my intent to make this break; and it was my hope, years ago when I started practicing druidry, that they’d come together somehow, sometime. But I didn’t know how that might happen.

Then I got a wake-up call at work: I wasn’t doing so great. My job performance had been disappointing. I needed to step up my game. And if I continued on my course, I’d be in real danger of… well, the consequences remained unspoken, but that of course made the imaginings all the more dreadful.

[Continue Reading...]

A Prayer for the New Year

Puget_Sound_Discovery_Park_Feb_2012

“To pray for particular favors is to dictate to Divine Wisdom, and savors of presumption; and to intercede for other individuals or for nations, is to presume that their happiness depends upon our choice, and that the prosperity of communities hangs … [Continue reading]

In Which Links are Forged and Pods are Cast

scraped_green

My attention has been away from this blog for a while, so I thought it might be interesting to collect some links to what I've been working on. Over at Faith, Fern, and Compass, for example, I’ve contributed a couple of articles that might be of … [Continue reading]

Story, History, and Meaning

MoonWithTrain

In the episode of Faith, Fern and Compass we posted this week, Alison and I talked a bit about stories, and what their purpose might be. Is storytelling something with evolutionary origins? If so, what? And why? It’s a completely open question, but … [Continue reading]

On the Meaning of Life

wheredoideascomefrom

“In our life there is a single color, as on an artist's palette, which provides the meaning of life and art. It is the color of love.” - Marc Chagall “The meaning of life is that it stops.” - Franz Kafka “Life is without meaning. You bring the … [Continue reading]

Sodden Spring

wintersolstice2007

Seattle, they say, is a rather wet city. But the last few days were sunny and warm, so I guess I was lulled into thinking (wishing? hoping?) that perhaps the worst of the showers were over. Late yesterday, in the golden late evening, Alison in a coat … [Continue reading]

The Toxic Society

Aug14.2011

I stumbled on an old, ignored piece of news the other day, which struck me powerfully. Apparently crime rates in the United States continue to plummet, despite the ongoing recession. While I had assumed that the drop in crime rate was related to our … [Continue reading]

Wilderness Among Us

Zemanta Related Posts Thumbnail

Alison and I have been spending a lot of time in Seattle's parks this spring, and it got me thinking about the word park. It's an old Proto-Germanic word, originally parruk, a type of enclosure for animals, such as a sheep pen. By the mid 13th … [Continue reading]

Genesis: the Story of Why We’re Different

In the summer of 2011 I was fortunate enough to go to the Wild Goose Festival, a gathering of speakers and artists active in the "emergent Christianity" movement, and there Alison and I met up with Carl McColman, who introduced us to Mike Morell. … [Continue reading]

Gaus: Freedom, Morality, and the State

Puget_Sound_Discovery_Park_Feb_2012

Ok, here's another book I desperately want to have (and while I'm wishing, it sure would be great to have the time to read it as well): The Order of Public Reason: A Theory of Freedom and Morality in a Diverse and Bounded World by Gerald Gaus. It's … [Continue reading]

Self-Help Love-Hate

I'd like to read some Montaigne -- partly because it's like 18th-century self-help, and partly in spite of it. I have a love-hate relationship with the idea of self-help. On the one hand, it's a genre full of charlatans, fly-by-night money-back … [Continue reading]

Moon

MoonWithTrain

The moon was full this morning in Virgo -- an earth sign ruled by the messenger god Mercury. What better time to bring the moon to earth? And by coincidence (?), just as the Earth was placed directly between the sun and moon, the sun reached out with … [Continue reading]

The Upper Airs: Layers of Landscapes in Meditation

Zemanta Related Posts Thumbnail

In meditation I almost always return to an inner landscape which I've described in a lot of detail elsewhere, but starting about a year ago I discovered I had access to another world, one that felt like it was directly above the old one -- as if it … [Continue reading]

Sound

Zemanta Related Posts Thumbnail

This afternoon, shortly before four o'clock, the sun, which had been low and sickly most of the day, began to seriously consider setting, her flames licking the clouds and igniting them all along the horizon above the Olympic mountains, and tracing … [Continue reading]