Wednesday, June 22, 2016

The Biology of Password Security

If you can, try to accurately recall a complete sentence that you just said, to another human being. One that wasn't very long, nor very short, nor a cliché, nor a quote. One that was grammatically correct to you. 

That sentence is more than likely unique in human history. At the very least, if you type it into Google, with quotes around it, you are unlikely to find it. Try it a few times. 

People are often under the misimpression that all human sentences must be on the internet, making up a kind of corpus of all languages. Nothing could be further from the truth. Any natural language 'corpus' is a finite set of captured sentences: they are superficial artifacts of the complex human thought that produced the sentence. Even if a corpus was, somehow, an infinite set of sentences, it wouldn't be the right infinite set, because we don't yet know what that set is.

It's not possible to get a machine to automatically generate your phrase. There is no generator for all (and only) the sentences of any human natural language, for the following very simple reason. The mechanism that produces language is in the human brain; the brain is an extremely complex biological system, and we understand some things about it, but not much. It is highly structured, but we do not know the structure, and in fact we only have a few dozen reliable hints about the structure, despite centuries of intense work by legions of linguists. Since that biological structure is a major factor for any natural language grammar, we have no worked-out grammar (syntax), in the sense of an explicit definition of an infinite set of sentences for any human language. The actual grammar is a faculty of our brain, the faculty we use to both generate sentences and evaluate whether something is grammatical. It is part of our biology, and we have no more conscious access to its detailed operation than we do about our visual system, or, for that matter, our digestive system. We must construct experiments -- testing this biological grammatical 'meter', this language faculty, in the same way we construct experiments on our visual system with optical illusions -- in order to find out things about its operation. 

This is a research initiative, and we'll all be dead before the human natural language mechanism is understood well enough to create a generator for all and only natural language sentences. 

So, it's quite safe to take any natural sentence, like this one [but not this one, of course, since it's been written down!] and use it as your password. (These are also known as passphrases).

It's also easier to remember natural sentences than all this nonsense. But it's not trivial to remember natural sentences. You need to train yourself, and learn to be sensitive to your own speaking, including speaking to yourself (we mostly use our language faculty to talk to ourselves). If you're a writer or actor you might already have practiced this facility for remembering sentences, which we often call an "ear". But anyone can do it. It's part of our genetic endowment.

If we could develop a culture of language sensitivity, we'd have far fewer problems with passwords. Those silly and unnecessary "trick" password-generators would then become a thing of the past.

One more note about infinite sets, because there's a misconception about them. There are an infinite number of infinite sets, but a very limited number of infinite sets that are human languages ... at most a finite multiple of all the people who ever lived. 

A hypothetical infinite set can be aspirational (all the sentences of English), but an actual infinite set requires a generator function. We can prove that there's an infinite subset of sentences within English ("this, and that, and this ..."), proving that the hypothetical full set is infinite. But we don't have a generator for all and only the sentences of English, or any of the other billions of languages that ever existed (assuming, again, that the upper bound is some multiple of every individual, with somewhat unique human languages of their own).

However, the joy of natural science is to discover more about the structure that is universal in all this variation ... "universal grammar" just means that part of language that is our genetic endowment. In this sense, every human language is the same. And until we understand the universal grammar, we cannot have a single complete generator for any particular natural language. Note also that we use our brains to generate what to say: language is, after all, the expression of thought. So until we understand thought, we won't be able to generate all-and-only sentences of any language. 

So, again, your real human sentence is safe from hackers. (That would have been a good one.)