Am trying to square
Metspalu, et. al.'s 2011 paper and
Laziridis et. al.'s 2016 paper.
Here's what's bugging me. The diagram below is a crop from a diagram in the Metspalu paper, a graphical representation of the genetic make-up of various populations.
You can see South Asia is mainly k5 and k6. k6 is mostly confined to South Asia, while k5 extends into Central Asia, the Caucasus, the Middle East and into Western Europe.
They write:
We found no regional diversity differences associated with k5 at K = 8.
Thus, regardless of where this component was from (the Caucasus, Near
East, Indus Valley, or Central Asia), its spread to other regions must
have occurred well before our detection limits at 12,500 years.
Accordingly, the introduction of k5 to South Asia cannot be explained by
recent gene flow, such as the hypothetical Indo-Aryan migration. The
admixture of the k5 and k6 components within India, however, could have
happened more recently—our haplotype diversity estimates are not
informative about the timing of local admixture.
___________
PS: thanks to guest's comment below, I know clarification is needed: Metspalu et. al. run
ADMIXTURE, which estimates how much a modern sample of unrelated individuals derives their ancestry from a set of postulated ancestral populations.
The typical dataset consists of genotypes at a large number J of single nucleotide poly-
morphisms (SNPs) from a large number I of unrelated individuals. These individuals are
drawn from an admixed population with contributions from K postulated ancestral populations. Population k contributes a fraction qik of individual i’s genome.
You try various Ks and ADMIXTURE also estimates the standard errors on the results. Reich 2009 used a Principal Components Analysis (PCA) to come up with ANI/ASI. ADMIXTURE was created in 2009 apparently.
We are told:
Choice of an appropriate value for K is a notoriously difficult statistical problem. It
seems to us that this choice should be guided by knowledge of a population’s history. Be-
cause experimentation with different values of K is advisable, admixture prints values of
the familiar AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion)
statistics, widely applied in model selection.
_______________
Strictly speaking, their detection limit is 500 generations, and they use 25 years per generation. The point is that k5 in South Asia dates to more than 500 generations or 12,500 years ago.
The next is a little leap of mine - is it justified? In the Ancestral North Indian/Ancestral South Indian (ANI/ASI) model, ANI corresponds to k5 and ASI corresponds to k6. That this is so is not entirely clear to me.
We now come to Laziridis et. al. They do an ancient DNA (aDNA) analysis of Near Eastern samples (Near Eastern with respect to Europe) dating from 12000 to 1400 years ago, and they refer to aDNA analyses of Steppe inhabitants; as far as I know, the Steppe aDNA does not go back before 12000 years. Now, the Laziridis paper says:
We show that it is impossible to model the ANI as being derived from any
single ancient population in our dataset. However, it can be modelled
as a mix of ancestry related to both early farmers of western Iran and
to people of the Bronze Age Eurasian steppe...
But if ANI == k5, and k5 spread before 12500 years ago (strictly speaking, 500 generations, while the aDNA dates are presumably radiocarbon dates) why would one expect to explain ANI in terms of contemporary peoples or peoples younger than ANI?
Perhaps one can say that Near Eastern aDNA, Steppe aDNA and ANI (k5) all arose from the mixture of two ancestral populations X and Y (ancestors of the 12000-year-ago-people) and the Near Eastern aDNA and the Steppe aDNA represent relatively unmixed descendants of X and Y respectively, while ANI is a descendant mixture of X and Y. I don't think this is what the Laziridis paper does.
My guess is Laziridis et. al. instead of sticking to just to genetics, also buy into the predominant theory of the spread of Indo-European languages, and hence attempt to explain ANI in this illogical way, or else Laziridis thinks the Metspalu paper is wrong and ANI(k5) is younger than 12500 years; or else I have misunderstood Laziridis or else ANI != k5.
PS: the most probable of the above alternatives is that I misunderstand Lazirides, the second most probable is that Laziridis et. al. don't think Metspalu is correct, and ANI is (much) younger than 12500 years.
_____
PS: Jan 6: This larger excerpt of a diagram in Metspalu shows ADMIXTURE with K=8 and K=12 (K is the number of hypothetical ancestral populations) and you can see that it does not really change the story that most of Indian ancestry traces to two components. One would hope that one can come up with an objective definition of k5, k6 that is the same when computed with different but adequate samples.