March 3, 2013

On the plausible ages of human Y-DNA haplogroups

When I discussed yesterday the new A00 Y-DNA lineage, you may have found strange that I was accepting the rather extreme age estimates when I am usually very critical of "molecular clock" age estimates. 

The reason is that my main criticism is against the way too common such estimates (for Y-DNA) based on a handful of STR markers, which do not seem but a pointer and not the real stuff. The real stuff is in the SNPs instead but the problem is that, right now, we know only a few of the many that must be lurking in the actual Y chromosomes that all men carry in their cells. As the exact amount of SNPs in any given lineage is unknown, there is no way of counting them and, that way, establishing the relative chronology of the Y-DNA phylogeny.

However this will change soon if is not already doing so, because full chromosome sequencing is every day cheaper and that is what has allowed, for example the 1000 Genomes Project. A few weeks ago, an open access paper by A. Van Geylesteen et al., emphasized this immediate future in which our knowledge of the Y-DNA tree will almost literally explode. 

But the fact is that there is at least one person who has been already working in the open with that perspective in mind. Terry D. Robb, at his site mostly dedicated to haplogroup I1, published some months ago a (partial) Y-DNA tree of Humankind (update 10 or this PDF) that calculates haplogroup relative ages based on nucleotide differences between the Y-chromosome sequences of the 1000 Genomes Project. 

It is still not rocket science but it is getting quite close. A problem may be that some haplogroups are too thinly represented (actually only R1b, O3 and E1b have large enough samples to be fairly safe about them internally) but while this may be a problem for coalescence ages (because maybe key sublineages are not represented), it should not be at all for divergence ones, i.e. the node where they separate from their closest relatives. So we cannot be certain that the apparent coalescent relative age of, for example, haplogroup D (n=17) is correct but we can be quite certain that the relative age of its divergence from its "brother" E at the DE node is good. 

The other and main problem is calibration. And this is the only (albeit important) aspect where I disagree with Robb's method. He insists on scholastically using academic references from only the population genetics literature (which systematically produces too recent "ages") and ignores archaeological references altogether. Therefore I have taken his graph and modified that part as follows:

click to expand
As you can see, I calibrated by equating age(CF) = 80,000 years ago. This is based on Petraglia 2007 and other materials that establish that there was modern human presence in South Asia since c. 80,000 BP, before and after the Toba ash layer (74 Ka BP). A minor doubt would be if that date would better correspond with Y-DNA F (which in my calibration above shows up right after the Toba event) but that would have made all ages even older (not really a problem for me but I rather like better this calibration).

The result is somewhat shocking because it pushes the age of A0 to c. 265,000 years ago, making it effectively pre-Sapiens.  Relatedly, it would push the age of A00 (assuming everything else in the Méndez paper is correct) to c. 450,000 years ago. But after the initial surprise... why not? After all most of us have Neanderthal admixture and they must have diverged a million year ago or earlier. And some peoples even have minor Homo erectus admixture most likely, diverging some 1.8 Ma years ago probably. 

So, well, these are my reasons and this tree by Terry D. Robb, with my own chronology as above, is probably the best and most realistic estimate around of Y-DNA haplogroup ages. 

Enjoy.

19 comments:

  1. "And some peoples even have minor Homo erectus admixture most likely, diverging some 1.8 Ma years ago probably."

    Eh???

    Oh, are you referring to the hypothesis that Denisovans had some H. erectus admixture (notably in the mitochondrial lineage)? Did any of that survive into living humans? (I know the mitochondrial lineage didn't.)

    ReplyDelete
    Replies
    1. Exactly my point. Papuans, Australian Aborigines and to lesser extent some other ancient peoples of their vicinity (other Melanesians, Polynesians, Filipino Negritos) have obvious Denisovan admixture, which I think it's probably only about half non-Neanderthal and corresponds to Asian H. erectus, which they may have encountered in Sundaland or Wallacea.

      See in this blog:
      http://forwhattheywereweare.blogspot.com.es/2012/01/denisovan-admixture-may-actually-be.html
      http://forwhattheywereweare.blogspot.com.es/2010/12/explaining-denisovan-and-also.html
      http://forwhattheywereweare.blogspot.com.es/2011/09/denisovan-admixture-widespread-beyond.html

      There could also be some much more diluted such admixture in some SE Asians:
      http://forwhattheywereweare.blogspot.com.es/2011/11/minimal-denisovan-admixture-in-se.html

      Delete
  2. The 1000Genomes Project used 1x coverage. Usually, the sequencing is done at 30x coverage, keeping in mind I barely know what I'm talking about here. I think each "coverage" misses a lot of SNPs, maybe half of them, that's why 30x coverage is necessary to ensure reduction of errors to a minimum acceptable level. I presume 1x coverage is vastly cheaper than 30x coverage and is what allowed the Project to sequence a whopping 1000 chromosomes, while other studies struggle to fully sequence the 23 chromosomes of just 1 person. The conclusion is that the 1000Genomes Project must have missed a huge amount of SNPs. It would be best to select a few carefully chosen y-chromosomes and then do a proper 30x or 60x coverage of them.

    Dienekes made some calculations also when the 1000Genomes data came out last year. Using a mutation rate of 1.25 x 10^-8 per 25-year generation, he got a 6400 date for the separation of R1b-U106 and R1b-P312. He picked the mutation rate from a study by Kong about mutation rate variation in fathers of different ages. It was an important study, because the just released study of y-dna A00 which prompted you to write this post also stated they used the Kong mutation rate. But looking anew at the Kong study, I think the mutation rate is actually slower than 1.25 x 10^-8 per 25-year generation. The study says "average father’s age of 29.7, the mutation rate is 1.20 × 10−8 per generation", which I believe would mean the 25-year generation rate would be just 1.00 x 10^-8. If true, Dienekes original estimate of 6400 years would be more like 8000 years.

    ReplyDelete
    Replies
    1. Well, it should still be quite better than STR-based calculations, so far extremely unconvincing. Even in the best cases (with Zhivotovski rates) the results don't make much sense especially when looked at global level.

      I prefer not to pick any mutation rate but simply calibrate the results using a most likely reference, or, if you wish, several, of material (archaeological) significance. In general all age estimates need some sort of calibration, in many cases a Pan-Homo divergence figure has been used but this one, 5, 6 or 7 Ma is almost half on average of the real one, which is surely 8-13 Ma.

      http://forwhattheywereweare.blogspot.com.es/2010/11/chimps-and-humans-divereged-some-eight.html
      http://leherensuge.blogspot.com/2008/04/new-paper-ofn-chimpanzee-and-bonobo.html
      http://forwhattheywereweare.blogspot.com/2010/11/chimps-and-humans-divereged-some-eight.html
      http://forwhattheywereweare.blogspot.com/2010/10/very-low-mutation-rate-mwahahaha.html

      Delete
    2. I agree. What's more, one of the things that called my attention the most when analyzing the 526 y-chromosomes was the high age estimate of E1b1b-M81, the clade that stands out precisely for having a very low STR diversity. It's age estimate was 7500 years, even higher than the R1b-U106 / R1b-P312 split. There were 3 E1b1b-M81 samples, from Puerto Rico and Mexico.

      By the way, I would really appreciate if someone knowledgeable could give me an answer on my second paragraph. Am I right to think the mutation rate as stated in Kong is 1.00 x 10^-8 per 25-year generation, as opposed to the 1.25 x 10^-8 per 25-year generation rate that Dienekes inferred from that same study and used in his estimates?

      Delete
    3. I really do not know which is the Kong mutation rate, sorry.

      As for E1b1b-M81, personally I would not be surprised if very old, because it's almost only found in NW Africa, with a refuge zone tendency for the higher frequencies. An issue may well be that STR markers that work relatively ok (with all the caveats mentioned above) for K subclades could horrible markers for other lineages such as E, O or whatever. I really wonder if their behavior varies from lineage to lineage and especially between macro-lineages.

      Delete
  3. The A00 lineage is extremely easy to identify by haplotype, having many STR values that are almost "illegal" in all other haplogroups. Maju pointed out that it's only been found in 2 people, one from FTDNA where the existence of the A00 lineage was first detected and another from coastal Cameroon. I presume this other sample is from the smgf.org database, because there are in fact 9 samples there, all from Cameroon, who are unquestionably members of this clade. I'd like to know more about how they detected this other sample and where they got it from.

    Finally, there's yet a 3rd sample that hasn't been noticed yet. It's from yhrd.org. The sample is from "France, Paris [French]". About half of these samples are North Africans and sub-Saharan Africans, so it makes perfect sense. So unfortunately, we have at least 3 occurences, but only 1 of them has a precise regional location in Africa.

    Regarding the extreme rarity of A00, after looking at what must have been about 15,000 sub-Saharan y-dna samples, from yhrd.org, smgf.org, and FTDNA, and finding only the above noted 3 occurences (I still have to see what's the deal with those 9 Cameroonian samples, are they all relatives, are they all from a single village), A00 has a frequency of 1 in 5,000! But actually, that's not so exceptional, I have seen many clades in Europe with a frequency of just 1 in 3,000 or less, such as the rarer R1a* haplogroups, for instance.

    ReplyDelete
    Replies
    1. Nine Cameroonians and one African-French! That's important news thanks.

      I could not find almost any info on the Mbo people. Wikipedia says in few lines that they are very poor forest people (but not specifically Pygmies), apparently foragers, but also that many among the nobility of other tribes of the area claim ancestry from them. I know that in West Africa traditional monarchs are often from a different ethnicity (maybe because that makes them more neutral?) and that in many places certain deference was given to the first peoples recognized by tradition. So I imagine that the Mbo are a deeply rooted local ethnicity, probably pre-Bantu. But can't say anything more.

      Delete
  4. A fairly accurate archaeologically calibrated chronological date of 265,000 years ago is close enough to the archaelogically earliest date for H. Sapiens to be plausible. A date of 450,000 years ago really points strongly to archaic introgression, the only time ever that it has been documented in a uniparental line if accurate. Given the suggestion of archaic admixture from both 13,000 years old possibly hybrid skeletal remains and from traces of even possibly a couple of archaic admixture episodes in African in autosomal genetics based on linkage disequalibrium analysis, one in the region where we have samples, and the other in Southern Africa, at about the time time frame, the case that this is introgression, while not rock solid, is fairly plausible.

    ReplyDelete
    Replies
    1. Other than for challenging the mental schemes, it does not matter much if a lineage is sapiens or whatever or where the blurry line of "sapiensness" is placed. I place it around the age of Omo because in that site there are two skulls and only one of them looks Sapiens, being the oldest one of its kind. But I guess it is as arbitrary as any other line.

      But in any case, I must say that you don't seem to be using the term introgression properly. Introgression is when a small reproductive contact leads, by means of positive selection, to large impacts. I know it's a cool-sounding word but it should be used correctly, and a lot of people does not.

      Delete
    2. According to this definition: "Introgression, also known as introgressive hybridization, in genetics . . . is the movement of a gene (gene flow) from one species into the gene pool of another by the repeated backcrossing of an interspecific hybrid with one of its parent species."

      When a non-sapiens gene enters the modern human gene pool via an interspecies hybrid individual who has an archaic hominin father and a H. Sapiens mother, this is introgression, according to that definition even if the size of the impact turns out to be modest.

      Also, if A00 is found in one in 3,000 sub-Saharan Africans, as may be the case, out of perhaps 600,000,000 sub-Saharan Africans alive today, for a total of 200,000 individuals give or take in 2013, based on what could easily have been a single hybrid individual 13,000 years ago, this would still qualify as introgression by your more restrictive definition.

      Delete
    3. Then you are right and I stand corrected. I really thought "introgression" was only meant for genetic material that had undergone a selective sweep after admixture but seems not.

      ... "if A00 is found in one in 3,000 sub-Saharan Africans"...

      So far it has only been found in Cameroon and probably in a single population within this most diverse country. Would it be as common as you say, it would have been found much earlier.

      Delete
  5. Can anyone tell me why the is the generation rate always 25 years? Did some of you made your own family tree? I have 10 generation in my family tree and the average generation is 33 years. You can say that I am strange, but please, take a look at Confucius family tree, it has 83 generations, its average generation age is also over 30 years. So my opinion is that the age of chromosome Y tree is underestimate due to the underestimation of the average generation age.

    ReplyDelete
    Replies
    1. Use better 30 years. The rationale behind short generation age estimates was that "primitive" peoples with a relatively short life span and without the constrains of a lengthy educative process would give birth earlier than "civilized" peoples, also 25 is convenient because it's 1/4 of a century. However for what I read in years ago this has been contested by anthropological studies that find that a more accurate age estimate is in the late 20s or even larger (I have a memory of 29 years but can't recall if for men or women) because, even if they do begin with childbirth a bit earlier, they keep having children until menopause → 45-15=30 → 30/2=15 → 15+15=30. So x30 years seems a good standard, more so for patrilineages.

      Delete
    2. The most common time period that I have seen used to estimate generations in population genetics is 29 years which seems to produce some decent estimates. Ultimately, the best answer is to calibrate with the most data available (in some cases making a calibration that simultaneously estimates multiple variables like mutation rate and generation age at the same time).

      The notion of using start of menses to menopause, with some adjustments at each end for reduced fertility (for social or biological reasons) at the extremes, recognizing that generations can be drawn from any birth order child, is a decent back of napkin place to start calculating. Also, it is worth recognizing that menses onset is earlier in prosperous times and later in lean times.

      Maju's observation that generation age may be longer for non-recombining Y-DNA than for mtDNA (or autosomal DNA drawn from both parents), because men have a longer period of fertility than women (and also, I would note, because men who attain high status at an older age may have higher lifetime number of children who survive to the next generation than other individuals in paleolithic societies), is relevant, however, and could help to explain discontinuity between estimates of species formation or haplogroup divergence ages based on mtDNA and Y-DNA.

      To illustrate this, suppose a band chief has one son by his first wife, and a second son by his wife's niece whom he married after his first wife dies. If this kind of thing happens even uncommonly, the Y-DNA generation gets to be longer on average than the mtDNA generation.

      Delete
    3. There are many issues here: women in forager conditions may have an elevated childbirth-related mortality, even if they are surely stronger and sturdier than our city dwellers in most cases. On the other hand in some cultures female childbearing really begins almost at puberty. But then not in many other cultures, probably the majority. I would think that for matrilineages the realistic age generation is somewhere between 25 and 30 years, maybe 27-28. But for patrilineages it is surely a bit larger, 30 or so. 25 has the odd advantage of being convenient: four generations per century, so has 33 (three generations per century) but the reality is surely in between.

      Delete
  6. "When a non-sapiens gene enters the modern human gene pool via an interspecies hybrid individual who has an archaic hominin father and a H. Sapiens mother, this is introgression"

    I think to call something that apparently separated from the 'human' line a mere half million years ago can hardly be regarded as a separate 'species'. 'Subspecies' at most. And I menetioned in the other post that I think the modern human Y-DNA originated in West Africa in the first place. The A00 represents exactly what it looks like, an ancestral form of the modern Y-DNA.

    ReplyDelete
  7. Terry, may I ask you if you are the same Terry D. Robb who has been working on haplogroup I1 or just some one else with the same name? If you are the author, I would like to pose you a question, if you don't mind.

    ReplyDelete
    Replies
    1. He's not. Different surname and clearly different people.

      Delete

Please, be reasonably respectful when making comments. I do not tolerate in particular sexism, racism nor homophobia. Personal attacks, manipulation and trolling are also very much unwelcome here.The author reserves the right to delete any abusive comment.

Preliminary comment moderation is... ON (your comment may take some time, maybe days or weeks to appear).