Sunday, June 23, 2013

On evolutionary trees

In a previous post, I had shown this diagram from a paper in Nature, showing this diagram of the evolutionary relationship between bonobos (B), chimpanzees (C) and humans (H).
What I find fascinating is that within the H line, we know with good precision of this complexity (based on these well-worth reading Smithsonian Institute pages).
What the crude graphic attempts to portray, is that modern humans and Neanderthals had a common ancestor somewhere around 600,000 years ago.   Then about 130,000 years ago, modern humans began their last round of migration from Africa, with A representing African humans and B representing out-of-Africa humans.  Neanderthals - N - became extinct some 35,000 years ago, but not before exchanging genes with the out-of-Africa humans.  Some 2.5% of the out-of-African humans genome is supposedly Neanderthal in origin.

So that nice tree isn't really a tree at all, not at this level of resolution.  Further, we are just lucky to know about the Neanderthal-human interbreeding, likely there are other now-extinct populations with whom genetic exchanges happened.   And all this is in just half the time of the 1Myr of the first diagram. 

We have some hope of deciphering some of this non-tree nature because the genome is huge,  some three billion base pairs in which the evolutionary changes can be traced.  When it comes to language and the construction of language trees, we have much less data.  The ancient Rg Veda gives us a vocabulary of a few thousand words in 10,552 verses; and the Rg Veda is the exception, not the rule, about ancient texts.   Something like the Behistun Inscription has perhaps 500 lines, I'm not sure how many unique words it has.  Tracing language family trees requires cognate words to be present in the two languages, which greatly reduces the number of relevant words based on which family trees are built. We have just about enough data to infer a family tree, but not the cross-linkings that likely occurred, which IMO, are much more probable with languages than with genes.

We have to understand the tree diagrams to be an approximation, a model of reality only, not reality; and with limited data we cannot do any better.