September 29, 2015

Twitter in the Aurignacian?

Heh, why not?

The curious fact is that a flint stone engraving recently found in the Aurignacian layers of Cantalouette II (Dordogne, SW France) bears a striking resemblance to the logo of the social network, what is quite funny at the very least.

Otherwise it is a very impressive early artistic expression of a rare type (avians are not common in Upper Paleolithic rock art). The Cantalouette II site was a flint stone quarry used by groups of the area and Arkeobasque (which is my source) speculates that it could be an expression of "art for the sake of art", an artist's caprice with no further meaning but excellent and very unusual technique, that was probably abandoned after its execution.

September 23, 2015

Negligible genetic flow in Slavic expansion to the Balcans

A new genetic study comes to confirm what most of us already knew: that Southern Slavs don't show any significant signature of immigration from the core Slavic area North and NE of the Carpathian Mountains that can be attributed to the so-called Slavic migrations of the Dark Age.

Alena Kushniarevich et al., Genetic Heritage of the Balto-Slavic Speaking Populations: A Synthesis of Autosomal, Mitochondrial and Y-Chromosomal Data. PLoS ONE 2015. Open accessLINK [doi:10.1371/journal.pone.0135820]


The Slavic branch of the Balto-Slavic sub-family of Indo-European languages underwent rapid divergence as a result of the spatial expansion of its speakers from Central-East Europe, in early medieval times. This expansion–mainly to East Europe and the northern Balkans–resulted in the incorporation of genetic components from numerous autochthonous populations into the Slavic gene pools. Here, we characterize genetic variation in all extant ethnic groups speaking Balto-Slavic languages by analyzing mitochondrial DNA (n = 6,876), Y-chromosomes (n = 6,079) and genome-wide SNP profiles (n = 296), within the context of other European populations. We also reassess the phylogeny of Slavic languages within the Balto-Slavic branch of Indo-European. We find that genetic distances among Balto-Slavic populations, based on autosomal and Y-chromosomal loci, show a high correlation (0.9) both with each other and with geography, but a slightly lower correlation (0.7) with mitochondrial DNA and linguistic affiliation. The data suggest that genetic diversity of the present-day Slavs was predominantly shaped in situ, and we detect two different substrata: ‘central-east European’ for West and East Slavs, and ‘south-east European’ for South Slavs. A pattern of distribution of segments identical by descent between groups of East-West and South Slavs suggests shared ancestry or a modest gene flow between those two groups, which might derive from the historic spread of Slavic people.

This is most evident in the identity-by-descent (IBD) analysis:

Fig 4. Distribution of the average number of IBD segments between groups of East-West Slavs (a), South Slavs (b), and their respective geographic neighbors.
The x-axis indicates ten classes of IBD segment length (in cM); the y-axis indicates the average number of shared IBD segments per pair of individuals within each length class.

For non-acquainted: shorter segments (left) indicates older relatedness, now very fragmented by repeated chromosome recombination, while longer segments (right) indicate more recent one, which had less time to be chopped into pieces.

The authors explain:
The presence of two distinct genetic substrata in the genomes of East-West and South Slavs would imply cultural assimilation of indigenous populations by bearers of Slavic languages as a major mechanism of the spread of Slavic languages to the Balkan Peninsula. Yet, it is worthwhile to add here evidence from the analysis of IBD segments: the majority of Slavs from Central-East Europe (West and East) share as many IBD segments with the South Slavs in the Balkan Peninsula as they share with non-Slavic populations residing nowadays between Slavs (Fig 4A and 4B; Table G in S1 File). This even mode of IBD sharing might suggest shared ancestry/gene flow across the wide area and physical boundaries such as the Carpathian Mountains, including the present-day Finno-Ugric-speaking Hungarians, Romance-speaking Romanians and Turkic-speaking Gagauz. A slight peak at 2–3 cM in the distribution of shared IBD segments between East-West and South Slavs (Fig 4A and 4B) might hint at shared “Slavonic-time” ancestry, but this question requires further investigation.

Another graph of interest is surely the Principal Component Analyses of the three types of genetic markers:

Fig 2. Genetic structure of the Balto-Slavic populations within a European context according to the three genetic systems.
a) PC1vsPC3 plot based on autosomal SNPs (PC1 = 0.53; PC3 = 0.26); b) MDS based on NRY data (stress = 0.13); c) MDS based on mtDNA data (stress = 0.20). We focus on PC1vsPC3 because PC2 (S1 Fig) whilst differentiating the Volga region populations from the rest of Europeans had a low efficiency in detecting differences among the Balto-Slavic populations–the primary focus of this work.

In the mtDNA graph (c) it is hard to discern any pattern, as the various studied populations seems to form rings of eccentricity around the Balcans, probably because no Western Europeans are present in this particular PCA. 

However in the autosomal (a) and Y-DNA (b) figures more defined patterns do emerge. Quite apparently in all three graphs, South Slavs appear as strictly Balcanic. 

More interesting is probably the relative position of Russian and Baltic speakers: the first showing very notable diversity almost representative of the whole East European region and again indicative of assimilation rather than replacement being the main drive in Russian ethnic expansion, at least in the North. 

Balto-Slavic peoples appear intermediate between Russians and Finns (and overlapping Estonians) in the Y-DNA graph and somewhat extreme in the autosomal graph, something that comes as no surprise, as they seem the best preserved vessel of Eastern Paleoeuropeans. Curiously a few Sorbian individuals also tend to that same extreme, what may well be a reason to increase interest on the study of this forgotten and neglected Slavic minority of Eastern Germany. Their Y-DNA is, also intriguingly, most similar to that of Swedes, rather than to their geographic neighbors or ethno-linguistic relatives.

Other Western Slavs, form two clear distinct sub-clusters: with Czechs being notably more Western than Poles and Slovaks, who tend to cluster with mainline Russians and Ukrainians instead. One can of course think that this Polish-Slovak-Ukranian-Russian cluster could be the demic or genetic core of the Slavic cluster. However I can't but wonder how much of that clustering, as well as the differences shown by Czechs and Sorbians should be attributed to older periods like those of Corded Ware Culture, Eastern Bell Beaker, etc.

Which is the correct date for the beginning of the SE Asian Bronze Age


According to this new study of Thai sites, the SE Asian Bronze Age, whose dating has been controversial, began probably in the late 2nd millennium BCE and not before.

Charles F.W. Higham et al., A New Chronology for the Bronze Age of Northeastern Thailand and Its Implications for Southeast Asian Prehistory. PLoS ONE 2015. Open accessLINK [doi:10.1371/journal.pone.0137542]


There are two models for the origins and timing of the Bronze Age in Southeast Asia. The first centres on the sites of Ban Chiang and Non Nok Tha in Northeast Thailand. It places the first evidence for bronze technology in about 2000 B.C., and identifies the origin by means of direct contact with specialists of the Seima Turbino metallurgical tradition of Central Eurasia. The second is based on the site of Ban Non Wat, 280 km southwest of Ban Chiang, where extensive radiocarbon dating places the transition into the Bronze Age in the 11th century B.C. with likely origins in a southward expansion of technological expertise rooted in the early states of the Yellow and Yangtze valleys, China. We have redated Ban Chiang and Non Nok Tha, as well as the sites of Ban Na Di and Ban Lum Khao, and here present 105 radiocarbon determinations that strongly support the latter model. The statistical analysis of the results using a Bayesian approach allows us to examine the data at a regional level, elucidate the timing of arrival of copper base technology in Southeast Asia and consider its social impact.

Fig 8. Bayesian probability functions (PDFs) for the beginning of the Bronze Age in Thailand.

September 17, 2015


This is something I've been chewing on for more than a year now and yet never got myself to blog about (although I have mentioned in private or in comments here and there). Impelled by the minor but quite apparent NE African influence, genetic and cultural, on the Neolithic peoples of the Levant, whose offshoots eventually landed in Greece triggering the European Neolithic, I decided in the Spring of 2014 to explore, via mass-lexical comparison, if Basque language (and by extension the wider Vasconic family, which I believe now to be that of mainline European Neolithic) might have any relation with Nubian languages. I did not expect to find anything but noise but to my surprise the number of apparent cognates is quite significant. 

My primary analysis was this one but now I have combined it with a comparison with Proto-Indoeuropean (PIE), which is also very probably related to the roots of Vasconic: LINK (open office spreadsheet). 

The synthesis is as follows:

Of course the "cognates" are only apparent cognates at this stage of the research and the evaluation is necessarily subjective. But judge yourselves. 

If we discard the "weak" apparent cognates, the vocabulary correlation between Basque and Nubian and between Basque and PIE is pretty similar. But, in my understanding, both are well above the noise threshold, an example of which could be the PIE-Nubian apparent cognates, which are many many less. 

I must say anyhow that the oblique apparent cognates, that is when one word sounds much not like its strict synonym but a related one (for example words meaning hot and fire), look all very solid and most intriguing. 

Also, when attributing probabilities to origins of Basque words, Nubian appears to be at the origin of almost double the words (26%) that can be attributed to PIE (15%). Of course, for lack of data or because they actually have other origins, the unknown origins apply to the majority of words (56%), double than the Nubian origin ones.

However Nubian here is constituted of three different languages (Dilling, Nobiin and Midob), while PIE is just a single theoretical construct. This last must be done this way because many modern and historical IE languages, notably in Europe, have other Vasconic substrate influences, which must be studied separately from general PIE-Vasconic shared vocabulary. This kind of late Vasconic influence is very much unlikely in the case of Nubian instead. In any case I don't know of any a proto-Nubian Swadesh list readily available. 

Finally I must mention that because the PDF format is horrible for copy-pasting, I chose to re-transcribe the Nubian words according to my best approximation using a normal keyboard (not always the same characters that the original list uses).

Strongest Basque-Nubian apparent simple cognates

  • Basque - Nubian languages (English)
  • azal - àzì, àzzì-di (bark)
  • haragi - árízh (meat)
  • odol - ógór, èggér (blood)
  • buru - úr (head)
  • oin - ó:y (foot)
  • esku - ish-i, ès-sì (hand)
  • hil* - di-ìl (to die)
  • euri - are, ara, áwwí, áré, árí, áró (rain)
  • harri - kugor, kakar (stone) [notice also the pre-IE root *kharr- speculated to be at the origin of Karst, etc.] 
  • lur - gùr (soil, ground)
  • haize - irsh-i, éss-í (wind)

There are some others that are shared with Indo-European and with similar subjective "weight", not listing them here to keep things clear. There are also other apparent cognates that are arguably less clear like bat - be (one) that I'm also skipping here but you can find in the spreadsheet.

*Hil (meaning both to die and to kill in Basque, which can't be confused because they conjugate differently) seems ancestral to English ill and kill (this one via a Germanic precursor).

The intriguing oblique cognates

Notice that these words do not mean the same, yet their meanings seem strikingly related.
  • Nubian (English) - Basque (English)
  • hor, koy, kà:r (tree) - harri (stone) [notice that zuhaitz (tree) can be interpreted etymologically as zur-haitz = wood-rock, so the relation is not that weird]
  • ok-i, og (breast) - ogi (bread)
  • a-l (heart) - ahal (can (verb), potential, power)
  • azh, àz-ír, àzza (to bite) - (h)ortz (tooth), aitz (rock, peak) [some argue that originally "to cut", present in many cutting tool names: aizkor = axe, aitzur = hoe, aizto = knife, etc.*]
  • shu, zhúù (to walk) - joan (to go) [often pronounced jun or shun]
  • é:zhi (water) - heze (wet) [also archaic particle *iz-, meaning "water" by all accounts: itxaso = sea, izurde = dolphin, izotz = ice, and common in Vasconic river toponymy]
  • zhuge (to burn) - su (fire) 
  • zhùg, sù, sú:w (hot) - su (fire)
  • úr-i, úrúm (black) - urdin (blue) [archaic also green, grey]

*This one is an obvious and very prevalent Vasconic substrate infiltrator in Western IE languages: axe, adze, azada (hoe in Spanish), etc.

How can this be possible?

It is of course a mere working hypothesis and ultimately you judge but I find it hard to disdain. However there is no apparent connection, notably no significant genetic connection, between Basques and Nubians. So how can we explain this?

I have it reasonably clear myself, so I made a map to explain it:

Basque is after all just the last survivor of a once much larger family (Vasconic), a family that most likely corresponds to the languages spoken by the early European farmers (mainline Neolithic of Aegean roots). As that expansion was largely done in about a mere thousand years, I estimate that when both branches met near the Rhine, the two peoples could still understand each other, even if with some difficulty. Only the Southern/Western branch(-es) survived long enough to leave historical evidence, so it is hard to guess how the Northern branch evolved anyhow.

The Nubian linguistic connection is anyhow not the only thing that requires the Levant or Palestinian Neolithic step, also Y-DNA E1b-M78 (mostly V13 in Europe, attested in some early farmers and still very important among Greeks and Albanians particularly) and probably the so-called "Basal Eurasian" component that Lazaridis detected among early European farmers and that could well be the signature of African genetics from the Nile.

Linguistically, also the very notorious presence of Semitic (an Afroasiatic branch) in West Asia is surely another legacy of the same African influences in the Mesolithic Levant. Before this research, I thought it was the only one but now I strongly suspect that at some point Nubian (Nilo-Saharan) languages were also present in the region. Maybe one (Nubian evolving towards Vasconic) corresponded to Natufian proper and the other (proto-Semitic) to Harifian, the semi-desert pastoralist facies of the same wider culture. Can't say for sure.

The chain was once long but now only some of the most distant links remain unbroken. It is difficult to imagine that they were ever connected at all...

To do...

A lot remains to be done, of course:
  • These mass lexical comparisons only apply to a few families in the region and the rest should also be tested for. My energies are limited and so are my qualifications as "linguist", so I encourage others, hopefully more energetic and knowledgeable, to expand.
  • Grammatical features cannot be analyzed by this methodology. Again my means are limited. 
  • Anthropological research would be an interesting complement. So far the only shared cultural trait I could spot would be the use of bells attached to ankles for dancing but there could be others. 
  • ...

Detailed analysis of South Iberian Solutrean

A new study has been published that reviews all the data on the Southern Iberian Solutrean, which (excepted probably Asturias) is a distinct autonomous facies relative to Franco-Cantabrian Solutrean.

João Cascalheira & Nuno Bicho, On the Chronological Structure of the Solutrean in Southern Iberia. PLoS ONE 2015. Open accessLINK [doi:10.1371/journal.pone.0137308]


The Solutrean techno-complex has gained particular significance over time for representing a clear demographic and techno-typological deviation from the developments occurred during the course of the Upper Paleolithic in Western Europe. Some of Solutrean’s most relevant features are the diversity and techno-typological characteristics of the lithic armatures. These have been recurrently used as pivotal elements in numerous Solutrean-related debates, including the chronological organization of the techno-complex across Iberia and Southwestern France. In Southern Iberia, patterns of presence and/or absence of specific point types in stratified sequences tend to validate the classical ordering of the techno-complex into Lower, Middle and Upper phases, although some evidence, namely radiocarbon determinations, have not always been corroborative. Here we present the first comprehensive analysis of the currently available radiocarbon data for the Solutrean in Southern Iberia. We use a Bayesian statistical approach from 13 stratified sequences to compare the duration, and the start and end moments of each classic Solutrean phase across sites. We conclude that, based on the current data, the traditional organization of the Solutrean cannot be unquestionably confirmed for Southern Iberia, calling into doubt the status of the classically-defined type-fossils as precise temporal markers.

Mallaetes, but not nearby Parpalló, is confirmed as one of the oldest sites of the Southern Iberian Solutrean, but has to share the honor with Nerja and La Boja. In general this would support the old idea of rapid expansion from Southern France (Dordogne is slightly older for this culture than the oldest Iberian sites) along the Eastern Mediterranean coast, mimicking what happened before with Aurignacian and Gravettian and what would happen later with Magdalenian and Epipaleolithic cultures of Magdalenian derivation. 

The ulterior evolution is rather fast and does not fit too well the French chronology: Middle Solutrean is short-lasting (mostly affecting Central Portugal) and almost overlaps with Upper Solutrean (oldest in Southern Portugal) and Gravetto-Solutrean (oldest in El Bajoncillo, an inland site not involved in the previous phases). 

All the new phases do impact the core site of Mallaetes, which seems to be well connected.

Fig 5. Time slices for Southern Iberia between 26 and 20 ka cal BP showing the distribution of modelled ages of the classical Solutrean phases.
The size of the dots represents increasing and decreasing levels of the 95.4% probability ranges determined from the duration (date range) of each phase, as calculated by individual Bayesian site models (see Appendix A in S1 File). Dots with two colors indicate overlapping date range probabilities for two or more phases found at the same site.

The authors underline that:
Two clear tendencies can be outlined related to the distribution patterns of the Lower Solutrean and Solutreo-Gravettian type assemblages. In fact, these two components seem to be restricted to the Mediterranean region and totally absent from the Atlantic facade.

They conclude that:
... the main impacts of our analysis on the current knowledge of the LGM adaptations in Southern Iberia can be summarized as follow:
  1. The call into doubt of the status of the traditionally-defined type-fossils as precise temporal markers for each Solutrean phase in Southern Iberia;
  2. The confirmation of the presence of tanged “Parpalló-type” points at a much earlier time (c. 25 ka cal BP) than previously thought;
  3. The potential contemporaneity at a very early moment (c. 25 ka cal BP) of the so-called Middle and Upper Solutrean/Solutreo-Gravettian phases (and thus should preferably be called facies)
  4. The likely organization, from a broad chrono-cultural point of view, of the adaptive systems surrounding the LGM event in just two discrete contiguous entities, known as the Proto-Solutrean and the Solutrean.

Some further context (my elaboration)

The Iberian Solutrean (roughly coincident with the Last Glacial Maximum) was the most populous period of the Upper Paleolithic in that province, at least according to the research of Bocquet-Appel

It was maybe even more important for North Africa (Iberomaurusian culture), something not discussed in this study but that I am conscious interests many readers, as well as myself. For this reason I checked for a good reference on oldest calibrated dates for Taforalt's Iberomaurusian (alias Oranian) and found this 2013 study that states that it is as old as at least 21,160 Cal BP

That would correspond with the fifth map (22-21 Ka cal BP), in which we see an increase of the closest site to North Africa: Gorham's Cave. It would be indeed interesting if someone compared the specifics of Upper Solutrean and that cave with Taforalt, which is by all accounts the oldest Iberomaurusian site. 

The Iberomaurusian genesis, the first known Upper Paleolithic of NW Africa, surely carried a still very apparent Iberian-like genetic signature to across the strait, notably mtDNA haplogroups H1, H3, H4 and H7, and also maybe V. The H subclades were claimed to have an unmistakable Iberian origin by Cherni 2008, while the distribution of the H subhaplogroups in the region was researched by Enafaa & Cabrera 2009. Comparison with Álvarez-Iglesias 2009 suggests that H7 should rather be French than Iberian by origin however, as it is rare in the peninsula. It could still be a Solutrean founder effect anyhow. 

Another possible founder effect of this Paleolithic trans-Mediterranean connection might be mtDNA U6. This lineage has a most likely origin in Northern Morocco but also has a lot of basal diversity across the strait in Iberia. However it could also represent a, so far archaeologically invisible, Aurignacoid migration via NE Africa with re-expansion to Iberia (and also in North Africa) in this period maybe. This could also explain its apparent connection with Y-DNA E1b-M81, which seems very old in NW Africa and is distributed in a similar way to U6 in the Iberian Peninsula and Europe in general.

September 9, 2015

Detailed analysis of ancient Atapuerca genomes

Everybody seems to be buzzing about this study on Atapuerca (El Portalón) site's Chalcolithic and Bronze Age genomes and I do think it is indeed worth taking a good look.

Torsten Günther, Cristina Valdiosera et al., Ancient genomes link early farmers from Atapuerca in Spain to modern-day Basques. PNAS 2015. Open accessLINK [doi: 10.1073/pnas.1509851112]


The consequences of the Neolithic transition in Europe—one of the most important cultural changes in human prehistory—is a subject of great interest. However, its effect on prehistoric and modern-day people in Iberia, the westernmost frontier of the European continent, remains unresolved. We present, to our knowledge, the first genome-wide sequence data from eight human remains, dated to between 5,500 and 3,500 years before present, excavated in the El Portalón cave at Sierra de Atapuerca, Spain. We show that these individuals emerged from the same ancestral gene pool as early farmers in other parts of Europe, suggesting that migration was the dominant mode of transferring farming practices throughout western Eurasia. In contrast to central and northern early European farmers, the Chalcolithic El Portalón individuals additionally mixed with local southwestern hunter–gatherers. The proportion of hunter–gatherer-related admixture into early farmers also increased over the course of two millennia. The Chalcolithic El Portalón individuals showed greatest genetic affinity to modern-day Basques, who have long been considered linguistic and genetic isolates linked to the Mesolithic whereas all other European early farmers show greater genetic similarity to modern-day Sardinians. These genetic links suggest that Basques and their language may be linked with the spread of agriculture during the Neolithic. Furthermore, all modern-day Iberian groups except the Basques display distinct admixture with Caucasus/Central Asian and North African groups, possibly related to historical migration events. The El Portalón genomes uncover important pieces of the demographic history of Iberia and Europe and reveal how prehistoric groups relate to modern-day people.

It must be said to begin with that the Atapuerca samples are actually similarly related to Basques as to Sardinians: they have more Paleoeuropean admixture than Sardinians and early European farmers but not quite as much as Basques. The various formal analyses, such as the one displayed to the right (fig. 3-B) confirm this intermediate position between what I'd call First Neolithic and Atlantic Neolithic genetic configurations, which can be directly associated to modern Sardinians and modern Basques respectively. 

Other data such as the mtDNA pool or lack of the lactase persistance allele also places them rather in the First Neolithic group in spite of their greater Paleoeuropean admixture, which is undeniable. The El Portalón samples have much lower frequencies of mtDNA H and U than Neolithic Basques or Burgundians, let alone the "hyper-modern" Neolithic Portuguese with their >80% mtDNA H (Chandler et al. 2005).

Similarly their lack of the T-13910 lactase persistence allele, dominant among modern Basques and many other Western Europeans, and already detected in at least some Chalcolithic Basques from an intermediate area (Upper Ebro banks), suggest that the old archaeological and anthropometric narrative of Mediterranean colonists migrating up the Ebro and establishing intermediate (but still rather Mediterranean) populations in the Upper Ebro banks, somewhat distinct of proto-Basques proper of a distinctive Pyrenean (Keltid?) type, was not completely wrong. It is true that the Pyrenean type cannot be anymore considered a pure derivate from Paleoeuropeans but rather a mixed population with strong Mediterranean Neolithic input but there is still some distinction very apparent in the archaeogenetics we know so far that cannot be totally ignored. 

Not a highlight of the study or the press release but I think it is very worth mentioning that the Bronze Age ATP9 woman also shows strong affinity with Britons, particularly Cornish and Scots. In general there is stronger mainland European affinity for this sample but still Basques and Sardinians, as well as the mentioned Britons, are the closest matches.

The general details of the samples are as follow:

As you can see only four samples had sufficiently good coverage to be considered for most analysis. Mitochondrial DNA, as usual, is the exception (all are good enough) but Y-DNA cannot be reliably assailed from such a poor quality sequence, nor actually much more that is not fuzzy. 

Stopping for a moment on Y-DNA, it must be said that haplogroup H2 was formerly known as F3, being a rare West Eurasian haplogroup (with some presence in the Persian Gulf and Zagros mountains area, as well as a scatter through Europe) which has seen its phylogeny recently refined under H (otherwise a South Asian and Roma lineage). 

I2a2a is not the typical Sardinian and Pyrenean lineage that is generally considered to be part of the Cardium Pottery package, originating probably in the Balcans, even I first thought it would be (thanks to Krefter for the correction). Instead it is some other Paleoeuropean lineage, which is today most concentrated in Northern Europe (→ map).

Genetiker claims that ATP3 should be R1b1a2-M269, while ATP17 would be also I2a2a. However given the very low coverage of these genomes, I would take such claims with great caution. As I've written somewhere else the question is not anyhow if there are some R1b of any sort here or there because the M269 → L11 stage has left only a scattered legacy, except for two large subhaplogroups: S116/P312 and U106. These two subhaplogroups are the big mystery and for them we only have so far late Chalcolithic terminus ante quem dates from Bell Beaker Germany and Corded Ware Sweden respectively. The lack of Atlantic Neolithic samples, be them British, French, Basque or Portuguese surely has a lot to do with this lack of evidence because, you know, follow the trail of modernity traits such as early "modern" mtDNA pools (Neolithic Basques and North French, Portuguese also) or early presence of the lactose tolerance allele (Chalcolithic Basques and Swedes). The answer to this pressing question must be in the Atlantic basin of Europe, just sequence it for Chaos' sake!

As for the mtDNA, the genetic pool is partly typical Neolithic (K, J, X) but with notable highlights: on one side, there are 3/8 U5 lineages, which are surely a legacy of the admixture with Paleoeuropeans, and, on the other side there is this H3 haplogroup that is nowadays important especially in Southwestern Europe. Unlike H1 or H6, which are known to have been carried by Paleoeuropeans, H3 so far has only been found among Neo-Europeans (i.e. from Neolithic onwards) but we do not know so far its ultimate origins for sure, as happens with haplogroup V (similar situation). 

In any case the low frequency of haplogroup H, makes the overall pool not yet "modern", unlike what happens with at least some Neolithic Basques (Paternabidea) and certainly with proto-Basques from Chalcolithic onwards, as well as other mentioned groups like North French Neolithic, Portuguese Neolithic and later also German Bell Beaker peoples. A more complete comparison of various ancient mtDNA pools can be found in fig. S4. 

Going back to the main focus of the study, which is autosomal DNA, I guess that we can continue with the following excerpt from the ADMIXTURE analysis:

Fig. 3(A) - Population structure of ancient and modern-day individuals. (A) Admixture fractions among modern-day individuals from Eurasia and North Africa together with 16 ancient individuals. Only ancient and modern-day individuals from Southwestern Europe are shown (see Dataset S1 for the complete plot with all individuals). Admixture components are labeled based on the populations/geographic regions in which they are modal.

The two main components are (1) the Paleoeuropean of HG modal, whose fractions may well be close to real in this case, at least judging on the strict alignment of Europeans between Paleoeuropeans and West Asians  provided by the PCA with North African samples (see below) and (2) the West Asian or EEF modal. We see that the First Neolithic populations were around 20-25% Paleoeuropean (the orange fraction would be West Asian or mostly so), while the Atapuerca samples show double HG scores, c. 40-50%. Modern Basques are even higher, around 50-60% Paleoeuropean in this analysis (although obviously a small fraction should be attributed to the Early Farmers' inflow).

In any case we see that in general the increase of the Paleoeuropean fraction is very notable in Atapuerca and Gokhem, and was plausibly even greater among other Atlantic early farmers, judging on the modern Basque and Gascon ("French South") data.

The next most relevant component is the Caucasus/Central Asian one, which should be attributed to the Indoeuropean expansion almost certainly. These would also have contributed with a proportional fraction of Paleoeuropean blood (maybe double than the black segment or something in that line). We see that this Indoeuropean influence is most important among the French and much less relevant in Iberia (but still much greater than among Basques or Sardinians). Nothing really new in this after Lazaridis 2014, Haak 2015 and Alentoft 2015, just a complementary perspective on the same issue. 

As for the North African component it seems to follow the same pattern of it being mostly a feature of the Western third of the Iberian Peninsula rather than something attributable to "historical events" (such as the Muslim period or the Phoenician conquest, much more influential in the South and East instead). Again lack of ancient DNA from those regions have hindered the understanding of the origin of this component which can be either (1) a Neolithic founder effect or (2) a Paleolithic founder effect dating from as early as the Solutrean-Oranian interaction around the Last Glacial Maximum. The fact that La Braña in some analysis shows North African or otherwise African affinities suggests that it is a Paleolithic element that has remained basically a Western Iberian thing (excepted the occasional founder effect such as certain district of Northern Wales and some diffuse scatter of related markers like E1b-M81 along the Atlantic coast of Europe).

If this interpretation is correct, then it strongly weights against the hypothesis of widespread (Western) Iberian origin of Chalcolithic (Megalithic, Bell Beaker) founder effects through Western Europe, because we would see it everywhere. Instead I'm much more inclined for a major role of what are now the Western parts of France, which were clearly involved in the (late) Neolithic colonization of Britain. I'd rather advocate for a "French" origin of Y-DNA haplogroup R1b-S116 for example. 

True that studying the Hexagon is sadly hampered by a hostile bureaucracy, legal framework and state ideology but no such obstacle seems to impede the study of ancient Britons, for example. In due time I guess. 

Another highlight is the TreeMix graph. However I must say that I am much more comfortable with the S10 tree rather than the highlighted one in the main body of the study. The main difference is that Ötzi (Iceman) is in this one placed near Gokhem and also with some Pitted Ware type admixture. And this is new and therefore a bit disturbing. Instead the S10 tree is not so surprising:

Compare with the simplified 2-A tree to the right...

You choose, as not enough data is provided in the paper to argument one choice above the other.

It is apparent in any case that Gokhem seems to have strong admixture from a source that is precursor of Pitted Ware Ajv58. This may also be the case of Ötzi (?), although it'd be weaker. 

Also, in spite of claims of La Braña admixture in Atapuerca I do not see that element clear in either graph. In one it is apparent that the Paleoeuropean influence is rather Lochsbour-like, while in the other it seems something rather ancestral to both WHG individuals. 

Confused? Well, what about the pre-Motala 46% admixture in Mal'ta 1? That is indeed new and it is consistent in both graphs. I would think it is suggesting that, rather than Ma1 (ANE) admixture in SHG, what we have is an ancient flow from Europe to LGM Siberia. Which can that be? Gravettian culture only. Oddly enough it makes some sense but, if this is correct, then it should also say that all the ANE buzz was a bit nonsensical after all. Confounding factors at play. 

To finish this review I'll copy here the various Principal Component Analyses, eye-candy, which in this case have diverse sampling strategies. Oddly enough for someone who has asked for Europe-only PCAs, it seems to me that in this case the one including West Asians is a better approximation to the reality of Atapuerca's ancients:

Fig S7

As you can see, the three Chalcolithic samples properly sit in this case right between Basques and Sardinians. The position of the Bronze Age sample should be explained by the British affinity, rather than being truly closer to Spaniards.

All this is less clear in the Europe-only analysis:

Fig 2-C

They do appear correctly between Basques and Sardinians but a shallow look could make them look as almost Spaniards, something that is not quite correct. Again the Bronze Age sample is pulled towards Britain, incidentally (and misleadingly) overlapping Spaniards. 

Someone somewhere asked for a PCA with North African samples. Well, this study also includes one. However no North African influence is apparent in any European, ancient or modern, so the best use we can have for it is using the lineal reorganization of Europeans between the Paleoeuropean and the West Asian polarities as a ruler of sorts to estimate the levels of admixture of the various populations:

Fig S6

And that's all for today. Soon to come (hopefully): new Y-DNA age estimates (I promised more than a month ago, shame on me), genetics of Baltic peoples (new study) and the Vasco-Nubian linguistic connection (something I've been ruminating for more than a year now).

September 5, 2015

Selection of dominant alleles vs recessive ones in population bottlenecks

Quantity over quality series

A technical study that however gives an interesting glimpse on the complex but generally reductive genetic effect of bottlenecks such as the Out-of-Africa migration of humankind is out in PLoS Genetics.

Daniel J. Ballick et al., Dominance of Deleterious Alleles Controls the Response to a Population Bottleneck. PLoS Genetics 2015. Open accessLINK [doi:10.1371/journal.pgen.1005436]

Population bottlenecks followed by re-expansions have been common throughout history of many populations. The response of alleles under selection to such demographic perturbations has been a subject of great interest in population genetics. On the basis of theoretical analysis and computer simulations, we suggest that this response qualitatively depends on dominance. The number of dominant or additive deleterious alleles per haploid genome is expected to be slightly increased following the bottleneck and re-expansion. In contrast, the number of completely or partially recessive alleles should be sharply reduced. Changes of population size expose differences between recessive and additive selection, potentially providing insight into the prevalence of dominance in natural populations. Specifically, we use a simple statistic, , where xi represents the derived allele frequency, to compare the number of mutations in different populations, and detail its functional dependence on the strength of selection and the intensity of the population bottleneck. We also provide empirical evidence showing that gene sets associated with autosomal recessive disease in humans may have a BR indicative of recessive selection. Together, these theoretical predictions and empirical observations show that complex demographic history may facilitate rather than impede inference of parameters of natural selection.

Author Summary
Dominance has played a central role in classical genetics since its inception. However, the effect of dominance introduces substantial technical complications into theoretical models describing dynamics of alleles in populations. As a result, dominance is often ignored in population genetic models. Statistical tests for selection built on these models do not discriminate between recessive and additive alleles. We show that historical changes in population size can provide a way to differentiate between recessive and additive selection. Our analysis compares two sub-populations with different demographic histories. History of our own species provides plenty of examples of sub-populations that went through population bottlenecks followed by re-expansions. We show that demographic differences, which generally complicate the analysis, can instead aid in the inference of features of natural selection.

Fig 3. The BR statistic at the time of observation.
ABOVE: At the time of observation tobs, the value of BR(tobs) is plotted as a function of the average strength of selection s and dominance coefficient h. Dominance coefficients appear as solid lines with fully recessive selection (h = 0) at the top and purely additive selection () at the bottom. For strong selection BR → 1 due to the rapid transient response. For weak selection BR → 1 due to the nearly neutral insensitivity to the bottleneck. For some intermediate dominance coefficient hc, a critical value occurs (hc ~ 0.25 in the example shown, but explored more generally in S1 Text) where additive and recessive effects cancel, yielding BR(hc) ~ 1. A low intensity bottleneck (IB = 0.05) is shown, with parameters 2N0 = 20000, 2NB = 2000, TB = 100, and tobs = 1000. BELOW: The same range of parameters is plotted for a realistic demographic model of the Out of Africa event comparing Africans and Europeans [48], where BR = 〈xAfrican/〈xEuropean. The European bottleneck has estimated intensity IB ~ ��(0.5), an order of magnitude stronger than the simple bottleneck above, allowing for potentially observable deviations from BR ~ 1 if a large fraction of analyzed variants act recessively with h < hc ~ 0.25.

I emphasize from the erudite legend:
The European bottleneck has estimated intensity IB ~ ��(0.5), an order of magnitude stronger than the simple bottleneck above.

Although Europeans are used for reference this bottleneck and the corresponding accumulation of deleterious alleles is the same for all non-Africans.

September 4, 2015

Neolithic Catalan aDNA provides further insight into European genesis

Again Marnie points me towards a very interesting study, this time totally new and dedicated to archaeogenetics.

Iñigo Olalde et al., A common genetic origin for early farmers from Mediterranean Cardial and Central European LBK cultures. Molecular Biology and Evolution, 2015. Open accessLINK [doi:10.1093/molbev/msv181]


The spread of farming out of the Balkans and into the rest of Europe followed two distinct routes: an initial expansion represented by the Impressa and Cardial traditions, which followed the Northern Mediterranean coastline; and another expansion represented by the LBK tradition, which followed the Danube River into Central Europe. While genomic data now exist from samples representing the second migration, such data have yet to be successfully generated from the initial Mediterranean migration. To address this, we generated the complete genome of a 7,400 year-old Cardial individual (CB13) from Cova Bonica in Vallirana (Barcelona), as well as partial nuclear data from five others excavated from different sites in Spain and Portugal. CB13 clusters with all previously sequenced early European farmers and modern-day Sardinians. Furthermore, our analyses suggest that both Cardial and LBK peoples derived from a common ancient population located in or around the Balkan Peninsula. The Iberian Cardial genome also carries a discernible hunter-gatherer genetic signature that likely was not acquired by admixture with local Iberian foragers. Our results indicate that retrieving ancient genomes from similarly warm Mediterranean environments such as the Near East is technically feasible.  

Very interestingly, the main hunter-gatherer input into early European farmers is not best explained by truly Western hunter-gatherers (Lochsbour, La Braña) but rather by a close relative from Hungary (KO1), very probably hinting to a Balcanic true source of this admixture, which Treemix puts at the very root of the Neolithic branch as something proto-KO1 and not truly close to KO1 as such:

Fig. S7. TreeMix (Pickrell and Pritchard 2012) analysis considering: (...) (B) four migration edges.

Note: the proto-Stuttgart admixture axis to Mbuti may well be, I believe, an artifact of the famous "Basal Eurasian" thing, which is probably African admixture in the opposite direction, which is not yet well understood but quite undeniable (most patently in Y-DNA E1b). 

The overall data is analyzed in both PCA and with the ADMIXTURE algorithm:

Figure 2. Genetic affinities of CB13. (A) Procrustes PCA of hunter-gatherers, Early Neolithic, Middle Neolithic and Copper Age farmers. The PCA analysis was performed using only transversions (to avoid confounding effects related to post-mortem damage). (B) Ancestry proportions assuming 11 ancestral components, as inferred by ADMIXTURE analysis.

All this is very much similar to what we are used to see in other comparable studies, however it's not exactly the same and I like to emphasize that slightly (or more rarely strikingly) different points of view on autosomal data, whose processing is always subject to the limitations of statistical analysis, are important to consider, because sticking too tenaciously to any one such single POVs may cause confusion and bias. 

So I annotated the supplemental materials' version of the above PCA as follows:

I merely drew two axes of admixture: firstly one that is strictly parallel to the PC1 axis which pretty much describes the axis of West Asian - Paleoeuropean admixture, using KO1 as reference. In West Asia it falls on the (Naqab) Bedouin B sample. While early Neolithic ancient samples approximate this axis (at a roughly 60:40 apportion location), it is very apparent that they do not fall strictly on it but are rather spread above and below the main axis. Those falling above the axis may have minor extra EHG or other "Oriental" admixture but many fall below it and so far we lack ancient references that could inform us about what is causing that deviation (although intuitively it would seem something African or at least "hyper-Mediterranean"). This tendency is reinforced in all the Middle Neolithic (Early Chalcolithic) samples, which also tend (along with Spain_EN) further towards the WHG polarity (something also apparent in the ADMIXTURE graph). So it is probably something that was in that WHG-like influence, which is however "too Mediterranean" for even La Braña.

Is it something from the original Balcanic HG (Lazaridis' UHG) admixture? Iberian but distinct from La Braña? Is it something North African? Or something else? I can't say at this point, all I can say is that the references needed for that fine tuning are missing so far.

The other axis I annotated was something that the PCA was almost commanding me to do: the axis spanning from Samara_HG to Sardinia and the bulk of Early Neolithic samples was just there crossing right in between the bulk of modern European samples. In this graph, even if there are no Yamna nor Corded Ware samples, they seem not to be needed: Eastern European hunter-gatherers explain everything on their own. On the other hand Ma1 (alias ANE) is sitting up there, almost hidden between the legend boxes, not needed anymore to explain anything, because Samara_HG does the job much better.

So what is left of what some have called the fateful triangle of European admixture, first proposed by Lazaridis 2014 as a {x.EEF+y.WHG+z.ANE} formula? In this PCA at least it seems we can ignore the three elements and replace them for a better fitting {x.Spain_MN+y.Samara_HG+Bedouin-B} formula:

It would need of course further testing but at least in this PCA it works very well and even most Sicilians and some Maltese fit within it, something that was not the case with the Lazaridis formula, whose triangle (not shown) run between Stuttgart, Lochsbour and Ma1 and would effectively exclude Sicilians and Maltese also in this PCA, as well as leaving huge empty areas towards the left (WHG) and top (Ma1). 

I've tried to do something similar using Starcevo (KO2) and Corded Ware or Yamna (not shown here) but they tend to exclude a much larger number of modern Europeans, not just Balcanics and Italians but also some Northern Europeans like Lithuanians. It can of course explain many things but it's not so inclusive. 

Of course, it does not need to be a triangle. In fact it is very likely that a fourth polarity runs between Basques and Cypriots towards the North Levant, going through Spain and Sicily. This axis does not seem to be explained by either Neolithic or Kurgan-like admixture since the late Chalcolithic and very probably indicates some other specifically Mediterranean source of admixture that at the very least influenced Italy and the Balcans.

In fact this cross-like structure of European autosomal genetics is invariably more apparent when only (or almost only) Europeans are considered. It is also apparent here but goes rather diagonally because of the heavy weight of the West Asian (and various Jewish) samples. 

In any case, I can draw on this PCA a rather obvious trapezoid that clearly includes all modern Europeans and also almost all Neolithic and post-Neolithic known ones:

It works much better than any triangle I could imagine. Within it I also drew (dotted lines) what seem to be the most important axes of European variation, which should roughly correspond to PC1 and PC2 in any Europe-only analysis. The first one runs between KO2 (Starcevo culture) and Samara_HG, and, as I said before, the figure itself was calling for it, being almost identical to the one I drew above. It is likely that the best reference is not quite KO2 but some Greek hunter-gatherer that still awaits to be studied and should be further down in the PC2. 

The other axis I drew is also demanded by the scatter of the graph and would roughly correspond to the PC2 of a Europe-only analysis. This axis is clearly displaced to the Samara_HG polarity relative to the Neolithic and Early Chalcolithic European samples, what seems to suggest that this admixture corresponds to a time after the first Kurgan impact, possibly the Bronze or Iron Age. 

However the period when Cyprus (which seems to be one of the polarities) would seem to play a most active role in the Mediterranean overlaps with the Kurgan expansion and only barely touches the Bronze Age, being rather Chalcolithic. One could also think of Phoenicians but it does not make much sense that Phoenicians had such an impact (they did not even colonize most of Sicily, let alone Italy or the Balcans). I don't have a good answer yet but my impression is rather something of the Late Chalcolithic, early Bronze at the latest, and hence overlapping in time with the Kurgan impact but with a Mediterranean dimension. 

Of course the Cypriot reference may be misleading and is something more loosely Anatolian (even if not quite like modern Turks either). In any case the Bronze Age (Late Bronze in the Eastern Mediterranean), when Mycenaean Greeks ruled the waves, after the collapse of the Eteocretan civilization, seems a bit late. I'll leave it at that: the window for this change affecting especially SE Europe seems rather narrow: late Chalcolithic and early Bronze Age at the latest.

Revising the Aegean Neolithic genesis

Marnie's blog points today to a very interesting review of the Early Neolithic Aegean. It is from a few years ago and hence totally oblivious to the archaeogenetic information that we are now familiar with. It is however surprisingly consistent with it.

Agathe Reingruber. Early Neolithic settlement patterns and exchange networks in the Aegean. Documenta Prehistorica XXXVIII, 2011. Freely accessible PDFLINK [doi:10.4312/dp.38.23]

ABSTRACT – The Neolithisation process is one of the major issues under debate in Aegean archaeology, since the description of the basal layers of Thessalian tell-settlements some fifty years ago. The pottery, figurines or stamps seemed to be of Anatolian origin, and were presumably brought to the region by colonists. The direct linking of the so-called ‘Neolithic Package’ with groups of people leaving Central Anatolia after the collapse of the Pre-Pottery Neolithic B resulted in the colonisation model of the Aegean. This view is not supported by results obtained from natural sciences such as archaeobotany, radiocarbon analyses, and neutron activation on obsidian. When theories of social networks are brought into the discussion, the picture that emerges becomes much more differentiated and complex.

Fig. 9. First appearance of Neolithic sites in the Aegean.

The overall picture that the author defends, which needs of course not to be the last word but is indeed interesting and well argued, is that of a relatively gradual transition from Epipaleolithic to Neolithic via maritime influxes, which obviously imply partial colonization but quite apparently assimilation of at least some of the pre-existent hunter-gatherer peoples in Greece (no evidence so far of Epipaleolithic or Mesolithic in West Anatolia).
The oldest sites are in the Southern Aegean, with Crete and the Lake District, and date to the first half of the 7th millennium. They are followed by the Central Aegean sites in Thessaly and Western Anatolia, while the youngest sites were founded at the end of the 7th millennium in the Northern Aegean (Fig. 9). Astonishingly, in the Argolid, where there was a strong Mesolithic presence, long-lasting settlements appear comparatively late, around 6000 BC. The islands, as well as Crete, were (re)inhabited continuously only after 5500 BC.

After a detailed examination of both the material culture and 14C dates, the model of a wave of colonisation sweeping over the Aegean as a whole must be rejected: that is, sites appear there at different stages in different landscapes.

The author then argues that only Knossos (Crete), Argissa and Sesklo (Thessaly), Ulucak (West Anatolia) and Bademagacı (Lakes Region of SW Anatolia) remain as well dated Early Neolithic I sites in the whole region. Addint that: "interestingly, the sites in the Lake District are older the closer they lay to the sea", possibly supporting a coastal migration model. 
Therefore, the modelled 14C dates do not support the idea of direct colonisation from Central Anatolia, but testify to a marine-oriented population living in this area in the transition to the EN I.

Reingruber argues for Aegean networks originally dating to the Epipaleolithic (aka Mesolithic) and at least partial continuity from those pre-Neolithic peoples, something that would seem supported by the most up-to-date ancient genetic data, which suggests around 50% Paleo-European ancestry, possibly from the Balcans, in the "purest" early european farmers (EEF) such as samples from LBK or Starcevo, even before additional admixture happened towards the West.
With this concept of regional and supra-regional networks based on the mobility of prehistoric people I do not argue in favour an exclusively autochthonous Neolithisation model. The input of the Anatolian/Near Eastern way of life in the Aegean is obvious. Many of the products and also the items used in symbolic activities were of Anatolian origin. Nevertheless, as has been shown, the Aegean ‘Bauplan’ displayed other priorities, the material culture differing from region to region. What I wish to stress is interaction based on face-to-face contact, on integration and social competence. Also a precise examination of the 14C dates argue against a demic movement ignited by a catastrophe at the end of the PPNB (compare also Thissen 2010.278).

Worth very much a full read anyhow. I just can provide here a glimpse after all.

August 30, 2015

France's autosomal genetics highlight Gascon-Basque distinctive cluster

A rather decent analysis of French autosomal genetics has been privately pre-published recently (thanks to Jean Secques for calling my attention to it and to the lead author for making it available online). 

Aude Saint Pierre et al. The fine-scale genetic structure of the French population. Submitted to the American Journal of Human Genetics in 2015. Freely accessibleref. LINK, direct PDF link.

A highlight of the study is that the samples all belong to people born in the 1930s and locations refer to their place of birth, so the results should be reflecting the historical demographics of the French Republic in the early 20th century. 

There are no supplemental materials available at this point, so it's only possible to get a glimpse of the general results and we can't go too much into fine detail. These general results are anyhow interesting. Let's see:

Figure 6: Prediction of geographic location of individuals from the test set (n=3,733) using multiple linear regression model. A) Expectation: The seven geographical regions of France according to the geographical coordinates of individuals in the test sample; B) Prediction of geographical coordinates according to the multiple linear regression model.

This figure alone synthesizes the findings: most French citizens cluster in a single unit, which geographically would correspond to NE France (GE region), only SW French (Gascons and Basques mostly) deviate very clearly and roughly fit their own geography towards the Bay of Biscay (or Bay of Gascony, as the French call it). Some samples from the SE (MED and RA regions) also follow this trend. A few outlier samples from the East (GE, RA) look rather Rhenish German, although the lack of controls from outside the Hexagon does not allow me to confirm this appearance. 

You may have noticed that I ignored the IDF samples but that is because it is the Paris region (Île-de-France), which was already back in the 1930s too cosmopolitan to be informative. That is of course reflected in all the results with "orange" dots being nearly of all affinities. 

Follow the principal component analyses, whose more salient information is again the peculiarity of Southwesterners, i.e. Gascons, Basques, and nearby populations.

Figure 2: The scatter plot of the first three PCs from PCA performed on the SNP
genotype data of the 4,433 individuals from the 3 Cities study. Individuals are coloured
according to the region where they were born. (Note: the legend corresponds to both PCAs)

Other than the "Gascon" specificity, which takes over PC1, I'd say that PC2 shows an "anti-Mediterranean" tendency and that PC3 instead shows a "pro-Mediterranean" tendency. This I gather from the relative position of the "red" MED cluster. They both weight the same.

Interestingly there is a prominence of the GO region (Mid-West between the Seine and the Garonne) which may indicate some sort of "Armorican" or "Briton-like" specificity. In appearance it could melt both the "pro-" and "anti-Mediterranean" tendencies but without being able to discern the particular dots (ID and location), I cannot swear for that. 

Much more clear is the "anti-Mediterranean" tendency of Gascons, Basques and allies when they are strongly detached from the main French cluster and instead they show a "pro-Mediterranean" tendency, overlapping at the extreme with the MED cluster, the closer they are to mainstream French. This happens in both PC2 and PC3. 

Little more to say, honestly. Maybe that the small Eastern group of outliers prominent in "anti-Mediterranean" tendency in PC2 probably corresponds pretty well with the outliers of the first graph, which looked German-like. So I guess that the positive side of PC2 probably corresponds with a Northern European tendency.

Interested on what you have to say on this one very particularly, reader.