March 16, 2016

South Asian autosomal structure

A recent study finds "five" components, although in practice they can be reduced to three.

Analabha Basu et al., Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure. PNAS 2015. Freely accessibleLINK [doi: 10.1073/pnas.1513197113]


India, occupying the center stage of Paleolithic and Neolithic migrations, has been underrepresented in genome-wide studies of variation. Systematic analysis of genome-wide data, using multiple robust statistical methods, on (i) 367 unrelated individuals drawn from 18 mainland and 2 island (Andaman and Nicobar Islands) populations selected to represent geographic, linguistic, and ethnic diversities, and (ii) individuals from populations represented in the Human Genome Diversity Panel (HGDP), reveal four major ancestries in mainland India. This contrasts with an earlier inference of two ancestries based on limited population sampling. A distinct ancestry of the populations of Andaman archipelago was identified and found to be coancestral to Oceanic populations. Analysis of ancestral haplotype blocks revealed that extant mainland populations (i) admixed widely irrespective of ancestry, although admixtures between populations was not always symmetric, and (ii) this practice was rapidly replaced by endogamy about 70 generations ago, among upper castes and Indo-European speakers predominantly. This estimated time coincides with the historical period of formulation and adoption of sociocultural norms restricting intermarriage in large social strata. A similar replacement observed among tribal populations was temporally less uniform.

One of the components, very distant from the rest, is the Andamanese one (Jarawa, Onge), but the isolated islands are not really in South Asia, rather in SE Asia (south of Myanmar, belonging to India only because of historical accident), what reduces the structure of South Asia to what we can see in the following graph:

Fig. 2.
(A) Scatterplot of 331 individuals from 18 mainland Indian populations by the first two PCs extracted from genome-wide genotype data. Four distinct clines and clusters were noted; these are encircled using four colors. (B) Estimates of ancestral components of 331 individuals from 18 mainland Indian populations. A model with four ancestral components (K = 4) was the most parsimonious to explain the variation and similarities of the genome-wide genotype data on the 331 individuals. Each individual is represented by a vertical line partitioned into colored segments whose lengths are proportional to the contributions of the ancestral components to the genome of the individual. Population labels were added only after each individual’s ancestry had been estimated. We have used green and red to represent ANI and ASI ancestries; and cyan and blue with the inferred AAA and ATB ancestries. These colors correspond to the colors used to encircle clusters of individuals in A. (Also see SI Appendix, Figs. S2 and S3.)

It is quite apparent that the AAA (Ancient Austroasiatic) component behaves as the ASI (Ancient South Indian) one but with a tendency towards the ATB (Ancient Tibeto-Burman) one, strongly suggesting it is basically product of admixture and not a truly autonomous ancestral component. 

This may be more apparent in the wider pan-Asian context:

Fig. 3.
Approximate “mirroring” of genes and geography. Genomic variation of individuals, represented by the first two PCs, sampled from 18 mainland Indians combined with the CS-Asians) and E-Asians from HGDP, compared with the map of the Indian subcontinent showing the approximate locations from which the individuals and populations were sampled.

In this wider mapping (would be even more clear if West Asian populations were included), we see that:
  1. ANI (Ancient North Indian) strongly tends to the West. In other analyses it is very similar to the Caucasus modal component and therefore a logical conclusion is that we are before a Neolithic immigrant element, much as happens in Europe.
  2. ATB (Ancient Tibeto-Burman) strongly tends to the East, more specifically SE Asia, and is therefore the reverse to ANI, although much less influential.
  3. ASI (Ancient South Indian) is the true aboriginal (pre-Neolithic) component of India, better preserved in southern populations but more clinal than the sample choice allows us to perceive.
  4. AAA (Ancient Austroasiatic) is very similar to ASI but has some SE Asian admixture, as is logical to expect, being Austroasiatic a SE Asian language of likely Neolithic expansiveness. 
So ASI and AAA are basically the same thing and that's why I say that the "five" components can be simplified to just three. Said that, it is indeed possible that there is underlying complexity within the ASI+AAA component but this study does not help us to clarify that. 

It is true that the K=4 (after exclusion of Andamanese, K=5 with them) fits the parsimony criterion best but the K=3 is also a good fit and shows AAA exactly as I describe them: largely ASI ("aboriginal") with a significant ATB (Eastern) component. The AAA component can therefore be perceived as consolidated, homogenized, ancient admixture. Prove me wrong on this and I'll eat my words. 

Caste apartheid stopped genetic flow

Quite interestingly, the authors also dwell on how the admixture process was stopped by the Gupta laws (Middle Ages) that imposed apartheid (caste system) enforced endogamy and caused the now apparent genetic isolation of the multiple groups.

We have provided evidence that gene flow ended abruptly with the defining imposition of some social values and norms. The reign of the ardent Hindu Gupta rulers, known as the age of Vedic Brahminism, was marked by strictures laid down in Dharmaśāstra—the ancient compendium of moral laws and principles for religious duty and righteous conduct to be followed by a Hindu—and enforced through the powerful state machinery of a developing political economy (15). These strictures and enforcements resulted in a shift to endogamy. The evidence of more recent admixture among the Maratha (MRT) is in agreement with the known history of the post-Gupta Chalukya (543–753 CE) and the Rashtrakuta empires (753–982 CE) of western India, which established a clan of warriors (Kshatriyas) drawn from the local peasantry (15). In eastern and northeastern India, populations such as the West Bengal Brahmins (WBR) and the TB populations continued to admix until the emergence of the Buddhist Pala dynasty during the 8th to 12th centuries CE. The asymmetry of admixture, with ANI populations providing genomic inputs to tribal populations (AA, Dravidian tribe, and TB) but not vice versa, is consistent with elite dominance and patriarchy. Males from dominant populations, possibly upper castes, with high ANI component, mated outside of their caste, but their offspring were not allowed to be inducted into the caste. This phenomenon has been previously observed as asymmetry in homogeneity of mtDNA and heterogeneity of Y-chromosomal haplotypes in tribal populations of India (6) as well as the African Americans in United States (34). In this study, we noted that, although there are subtle sex-specific differences in admixture proportions, there are no major differences in inferences about population relationships and peopling whether X-chromosomal or autosomal data are used. We have also found our inferences to become more robust when our data are jointly analyzed with HGDP data.

I can't but find quite curious how, once again, Indian and European histories behave so similarly: in Europe also a simpler but also "god-sanctioned" caste system (designed by Agustin of Hippo) was imposed upon the collapse of the Roman Empire (very similar dates). However popular revolutions gradually but systematically destroyed it. The same is happening in India now but with a delayed timeline. Instead Muslim West Asia (and surroundings) had no caste system and that's probably why it was so successful back in the day: because it allowed relatively more freedom and intellectual pursuit than other neighboring social systems. Of course, this stopped being the case after the Mongol conquests, roughly coincident with European Renaissance, when Islam cocooned itself into reactionary mode, leading to stagnation and eventually to colonial subservience.

H. heidelbergensis is Neanderthal ancestor and not 'Denisovan' cousin


The unprecedented sequencing of a small fraction of the autosomal DNA of Homo heidelbergensis from the Sima de los Huesos of Atapuerca proves that they are in direct ancestral line to H. neanderthalensis and not particularly related to Denisovans.

Matthias Meyer et al., Nuclear DNA sequences from the Middle Pleistocene Sima de los Huesos hominins. Nature 2015. Pay per viewLINK [doi:10.1038/nature17405]


A unique assemblage of 28 hominin individuals, found in Sima de los Huesos in the Sierra de Atapuerca in Spain, has recently been dated to approximately 430,000 years ago1. An interesting question is how these Middle Pleistocene hominins were related to those who lived in the Late Pleistocene epoch, in particular to Neanderthals in western Eurasia and to Denisovans, a sister group of Neanderthals so far known only from southern Siberia. While the Sima de los Huesos hominins share some derived morphological features with Neanderthals, the mitochondrial genome retrieved from one individual from Sima de los Huesos is more closely related to the mitochondrial DNA of Denisovans than to that of Neanderthals2. However, since the mitochondrial DNA does not reveal the full picture of relationships among populations, we have investigated DNA preservation in several individuals found at Sima de los Huesos. Here we recover nuclear DNA sequences from two specimens, which show that the Sima de los Huesos hominins were related to Neanderthals rather than to Denisovans, indicating that the population divergence between Neanderthals and Denisovans predates 430,000 years ago. A mitochondrial DNA recovered from one of the specimens shares the previously described relationship to Denisovan mitochondrial DNAs, suggesting, among other possibilities, that the mitochondrial DNA gene pool of Neanderthals turned over later in their history.

Some articles that describe the findings:
at Público (in Spanish)

Matthieson also found that the Sima de los Huesos hominids were closer to Denisovans and Neanderthals in mtDNA two years ago. But this sequencing of their nuclear DNA puts them much closer to Neanderthals instead.

Prüffer et al. found in 2013 that Neanderthals form a cline with "Denisovans" in nuclear DNA but not in mtDNA, in which they are closer to us. This one is a very interesting read for background, as it explores in great detail the various possible scenarios.

That "Denisovans" could be closely related to H. erectus (a catch-all term for most archaic populations, particularly in Asia) has been considered as very possible before (Waddell et al. 2012) but there is no genetic confirmation so far, neither strong rejection. Getting DNA from such ancient specimens is considered a breakthrough and this partial sequencing of 400,000 years ago is believed to be within the very limits of absolute possibility.

[Conclusions edited on Mar 19th because I got it all wrong and don't wish to keep confusing anybody else. Instead I listed several relevant background studies, judge yourself].