RootsChat.Com
General => Ancestral Family Tree DNA Testing => Topic started by: 4b2 on Wednesday 17 July 24 12:31 BST (UK)
-
Does anyone know what the level false positives for DNA matches on these services are?
I am aware that Ancestry has a system of stripping out shared DNA from matches they feel is not inherited via shared Ancestry. And it seems to me that strips out most false positives.
-
Thats a good question and I dont know the answer but I await like you, any insights!!
However I have recently come across an Ancestry Match of 60 cM which patently was not ancestral. One party was from a totally Jamaican non European family and the other was from a totally Gloucester, England family. I have another of 100 cM which I am quite sure is not ancestral but I have not "proven" it yet.
60 cM seemed high to me, although I regularly see Matches of up to 30 odd cM across the many accounts I manage which I am pretty certain are not ancestral.
-
I suppose its a bit hard to prove when its possible to be related to these non-related groups via NPE. I've got a few groups in my family that bewilder me but I've always figured that I just haven't found the connection.
-
I suppose its a bit hard to prove when its possible to be related to these non-related groups via NPE. I've got a few groups in my family that bewilder me but I've always figured that I just haven't found the connection.
My take is there are very few false positives on Ancestry, at least at >= 30cM. I use the search button to quickly look up dead ends in matches' trees, which allows you to patch most of them. Others might need a little digging, and some can't be done without ordering records. If you ignore the ones with no tree or are indeterminable, then I have clusters where I've found the common link. So my take is most if not all above 30cM are real relatives. Not sure about the lower ones.
When I contacted many on Living DNA I found many said they had a tree on Ancestry, where we didn't match at all in many cases. Living DNA matches are at least half false positives in my estimation. Ancestry do a very good job of stripping them out.
As for NPEs, in the medieval time about 1% of children were born out of wedlock, then in the 1500s it went up to about 2-3%, down to 1% in the 1600s, up to 5% in the 1700s and down to about 3-4% in the 1800s. Two studies have put the rate of infidelity at around 1.35%. So in the time period autosomal tests are relevant we are looking at about 5% chance of any birth being an NPE.
We can use DNA to prove our lines are correct, that leaves the other party. At four cousins that is five births with around a 25% chance of an NPE if we take the numbers above. NPEs seem to cluster, due to the morals (or lack thereof) of the family. But I think it's fare to expect around 25% of a ~4th cousin cluster to be coming from an NPE.
If it's illegitimacy, it's more obvious. Using overlapping locations, triangulation of matches and Ancestry's clues spotting a match by illegitimacy is not too difficult.
Infidelity can be a bit more tricky, as you don't have the gaping hole births out of wedlock tend to leave in a tree. But I have found two.
1) I found a match with a birth out of wedlock with the first name Harry Roberts. The test subject was on the same generational level and my great-grandfather and overlapped with one of my clusters where there were only two children who survived adulthood over two generations - one named Harry Roberts.
2) My best friend's father took the ancestry test. He shared the matches and the first thing I noticed was the first surname Ancestry suggested from his shared matches was an unusual one from which I descend. I being sorting his matches and soon come across 56cM match who I knew was also a descendant. As I continue I find more and more familiar matches all from this line of my ancestry. His father did not match my immediate families' tests, but in total there were over 200 matches in the cluster where the common ancestors are also mine. So I was able to pinpoint the shared ancestry from my tree and the rough point where it interjected with my friend's father. I found that at that point there was a Mr. Morgan who married a Miss Roberts. There were DNA matches for the ancestry of Roberts, but zero for Morgan. And Ancestry trees had many with the Morgan family in it, which I use as a measure of whether there will be tested descendants. I found that the first two children born to Mr. Morgan and the former Miss Roberts were actually the children of my 4X-great-grandfather who lived about 2 miles away. The following year he moved 20 miles away and it does no appear any ensuing children were his.
If one digs deeply enough into the matches, and catalogs them thoroughly based on the flow of genes, one can root out NPEs. But the upper limit of being able to do so would probably be births around 1800.
-
SouthseaSteel,
I'm not an expert but I have been studying genetic genealogy for several years and if anyone was able to demonstrate that matches of 60 or 100 centiMorgans were invalid, it would turn the industry on its head. I would even say the same about matches as high as 30. I think you are trying to deny the truth.
Zaph
-
I would show you the photos of the people involved but it would be inappropriate!!!
And dont worry I'm never in denial when it comes to DNA research!!
Ill grant you the 100 cM one is unresolved and Im prepared to defer on that but not the 60 cM one unless the other party is spoofing pictures, details and online documents etc
-
I add that I acknowledge that 60 cM is extremely anomalous and the accepted range of false positives is much lower at 30 max and more like 15 and below. I also acknowledge that 60 cM could be a 6th cousin but we arent that interested in it to prove it one way or the other!!
-
Ill grant you the 100 cM one is unresolved and Im prepared to defer on that but not the 60 cM one unless the other party is spoofing pictures, details and online documents etc
I have a 80cm match whose on line tree goes back to all his sixteen 2xgreat grandparents, and some branches further back than that. From his tree I could find no connection with me at all. All his ancestors were from Cambridgeshire / East Anglia. From our shared matches, the only line of mine where we could converge was from West Berkshire. I contacted him to ask whether he had any ancestors from this area, but he confirmed he had none. But when I checked his tree in more detail, I discovered his mother was born 3 years after his supposed grandfather's death (He left the "grandfather's" death blank on his tree). The "grandfather" was KIA in 1917. The record on CWGC gives the name of the soldier's widow, residence and names of his parents so no chance it was a different chap. Thinking I had uncovered something new, I contacted him about his grandfather. he said he knew all about it.
So there's an example where someone knowingly posts an erroneous tree. There also might be a case where someone is adopted, but not know about it, so they have unwittingly built an incorrect tree.
-
In the days before divorce was available to most people, before IVF or assisted conception, before babies born in hospital had identity tags, before adoption was formalised and probably other circumstances, people had to make private arrangements.
When we say "non-paternal event" we tend to think of infidelity, rape etc but people could take in parentless children, or the children of sick relatives, or children homed out by the parish...in other words for good or altruistic reasons.
Still a mystery at the moment for Southsea Steel but could be a lovely uplifting story behind it.
-
There also might be a case where someone is adopted, but not know about it, so they have unwittingly built an incorrect tree.
Interesting story.
Also, that people make errors in their trees. I had a relative on a cluster for the surname Smith. They had Smith ancestry in Liverpool, where several sons of this family moved. But it did not link up. So I looked into it myself and found they'd made an error and that Smith line went back to my Smith line.
-
4B2 - My impression is same as yours. Anc matches above a certain level are usually through a common ancestor. No doubt Timber helps filter out the matches by chance. Always wonder if it filters out some of the real ones as well. ;D
My experience with MH is different. My impression is there are higer level (>30cM) matches that are not via a common ancestor. I've even seen a triangulated match above 30cM on MH which on Anc is no match and we are both sure MH is wrong. When I posted on that match here there were several replies with similar experiences on MH. Caveat - I uploaded my raw data from Anc to MH so that may be a factor.
-
Timber helps filter out the matches by chance. Always wonder if it filters out some of the real ones as well. ;D
My experience with MH is different. My impression is there are higer level (>30cM) matches that are not via a common ancestor. I've even seen a triangulated match above 30cM on MH which on Anc is no match and we are both sure MH is wrong. When I posted on that match here there were several replies with similar experiences on MH. Caveat - I uploaded my raw data from Anc to MH so that may be a factor.
I think Timber does probably cut out some. I have a cluster of matches on Ancestry with some of the same people on MyHeritage, and other people related to them. I have about ten tests for that line and they appear as matches in some of those tests, but not others (on Ancestry), but they always appear on MyHeritage.
I also think there are false positives on MH at a higher level. Not sure what point they may really start creeping in on Ancestry. But I feel even matches around 20cM appear to be very relevant.
-
Currently I am not aware of any accurate method of determining False Positives.
Certainly not at the level of cM being quoted.
Yes they can be Grouped.
Yes you can use the basic DNA tools on My Heritage but even then the Triangulation Tool gives spurious results, it certainly has for me with three DNA matches each a known Cousin of each other yet I have an indeterminate result. Their Grouping Feature is something of a mixed bag in determining relationships and only gives the most likely matches that are probably going to give the desired results.
Ancestry, is useless with any form of Chromosome Browsing.
The Leeds Method is not reliable below 90cM.
As it is my false matches are all in the below 10cM shared DNA level and the four led me on a wild goose chase for a while until I determined that the branches simply should not exist in my tree.
There are probably too many ifs and buts to get meaningful answers
-
To get back to the OP I did find this in some online academic journal FWIW:
"Segments under 10 cM have about a 15% chance of being false (i.e. not inherited from mother or father)
Segments under 5 cM have about an 85% chance of being false"
But as noted here, always fraught to identify "wrong" data
-
Hi,
That would gel with what I am finding.
Pretty certain that sub 20 is a match, and sub 10 proven links in the 1770’s at 8 generations.
I wonder how much of the difficulty is at at that distance pedigree collapse which is before records is making a real link nearly impossible to prove. I have a few like that, surnames link and DNA match, but very hard to “prove” by documentary evidence.
-
To get back to the OP I did find this in some online academic journal FWIW:
Could you identify the journal or link to the article. Thanks
-
A quick re-research found this
https://whoareyoumadeof.com/blog/can-dna-matches-be-false/
but I saw it in some other "more formal" report too which I will endeavour to find .... again
-
A quick re-research found this
https://whoareyoumadeof.com/blog/can-dna-matches-be-false/
but I saw it in some other "more formal" report too which I will endeavour to find .... again
A good read thanks for posting.
If only we had better chromosome features in Ancestry it would make life so much better in filtering the DNA matches into a “priority pecking order” for research.
I agree with their writings.
-
Thanks for the link Southsea Steel. A really interesting article. Very surprised at the figure for 10cM matches ie 15% chance of being false. That means 85% chance of being IBD.
-
MyHeritage has a very high level of false positives. I posted a link here some time ago to someone who had gone through their own and parent's matches, some very large % did not match his own parents that matches him, I think it might have been over 40% under 40cM or so. MyHeritage has the same problem that GEDMATCH and other sites have in accepting uploads from different providers which use different chips at different times, which test parts of the DNA which is some cases only overlap by a small amount. So GEDMATCH and MyHeritage use algorithms to guess, which unfortunately mostly seem to produce nonsense matches. Ironically I think you may be more likely to get accurate matches on MH by uploading Ancestry DNA, as that better matches the original FTDNA/MH SNPs than the new GSA chip that MH and FTDNA have used since 2019 (23andme since 2017). Same applies for GEDMATCH whose database still use the SNPs of the original FTDNA kits, which produce wildly nonsensical matches for most GSA kit uploads.
-
Here is my post where I summarized this person's findings (unfortunately the website is defunct)
https://www.rootschat.com/forum/index.php?topic=850885.msg7190496#msg7190496
"This person did a detailed analysis of their matches compared to the parents, and concluded that any match that has segments of below 26cM in size may be IBS, and that 1/3 of those with segments under 17.6cM may simply be false matches possibly due to the MyHeritage algorithm not working correctly. Note though, that these were all LivingDNA uploads which test different but overlapping parts of the genome which may make the problem worse. So it might be that MyHeritage-MyHeritage matching is more accurate, but unfortunately I don't think the matches are marked with their origins (unlike GEDMATCH) so there is no way of judging this."
-
In case you were wondering!
https://isogg.org/wiki/Identical_by_state
-
In case you were wondering!
https://isogg.org/wiki/Identical_by_state
Do you have any insight into whether false positives don't form themselves into clusters. For example, on Ancestry you will have a cluster of matches, where many members overlap with each other. If you research the trees you will often find a common ancestor. Even if you can't find your link to that. I'd say of my best clusters I can find a common ancestor in up to about 40% of matches down to 8cM. Obviously we can't identify all of them, and then another 20% of so will be via NPEs. So, I think over all generations we can assume at least two thirds of clusterable Ancestry matches down 8cM are real relatives.
My guess is that when the overlap is via coincidence, rather than ancestry then it would be much less likely to show up as a shared match in a cluster. Maybe these inherited by coincidence are formed of DNA from more than one line of descent that just happen to be equal to someone else. It would make sense, if Ancestry are able to strip out many false positives.
-
In case you were wondering!
https://isogg.org/wiki/Identical_by_state
Do you have any insight into whether false positives don't form themselves into clusters. For example, on Ancestry you will have a cluster of matches, where many members overlap with each other. If you research the trees you will often find a common ancestor. Even if you can't find your link to that. I'd say of my best clusters I can find a common ancestor in up to about 40% of matches down to 8cM. Obviously we can't identify all of them, and then another 20% of so will be via NPEs. So, I think over all generations we can assume at least two thirds of clusterable Ancestry matches down 8cM are real relatives.
My guess is that when the overlap is via coincidence, rather than ancestry then it would be much less likely to show up as a shared match in a cluster. Maybe these inherited by coincidence are formed of DNA from more than one line of descent that just happen to be equal to someone else. It would make sense, if Ancestry are able to strip out many false positives.
I (and I assume most people) have many clusters I can't place (yet) :). In fact, using the new Ancestry Pro Tools it is very much easier to find them. Previously you were limited to searching for the same name and places on linked trees and hoping for the best! Most of them are in America, and likely just reflect a couple of things - (i) A much higher percentage of the US population has done DNA tests than anywhere else, and (ii) Since US censuses don't list exact birthplaces, either in the US, or abroad (although they do list the state for US born people), many earlier baptisms or births have left no record, it can be very difficult for Americans to trace back to the correct line in the UK or elsewhere. A few states have death certificates that help by naming both parents names, but in most cases these only came later at the end of the 19th century or later so won't help with earlier immigrants. In fact DNA may be the only way in some cases that it will be possible to work out a link. But it depends whether someone has already done that in depth investigation to do that whether they have made the link back to the UK that might then fit into your tree. Hopefully with these Pro Tools it might make it possible for more of our cousins over the Atlantic to do so!
-
I (and I assume most people) have many clusters I can't place (yet) :). In fact, using the new Ancestry Pro Tools it is very much easier to find them. Previously you were limited to searching for the same name and places on linked trees and hoping for the best! Most of them are in America, and likely just reflect a couple of things - (i) A much higher percentage of the US population has done DNA tests than anywhere else, and (ii) Since US censuses don't list exact birthplaces, either in the US, or abroad (although they do list the state for US born people), many earlier baptisms or births have left no record, it can be very difficult for Americans to trace back to the correct line in the UK or elsewhere. A few states have death certificates that help by naming both parents names, but in most cases these only came later at the end of the 19th century or later so won't help with earlier immigrants. In fact DNA may be the only way in some cases that it will be possible to work out a link. But it depends whether someone has already done that in depth investigation to do that whether they have made the link back to the UK that might then fit into your tree. Hopefully with these Pro Tools it might make it possible for more of our cousins over the Atlantic to do so!
Yes. Pro Tools is proving invaluable. I have one match who I think represents one half of a relationship via infidelity. The match is about 100cM, so expecting around a 3rd cousin. The only thing I have to go on is their name, which is a common forename + the surname Evans. But with Pro Tools, I can see two common matches are closely related. So I can pad their trees out down and probably find out who that key match is, with a very good idea who the common ancestor is. Plus, I've already found about 10 matches with common ancestry in that cluster. So there is a good amount to work with.
With the US, most of my ancestry is Welsh, and thus many relatives began moving to the Welsh areas in places like Pennsylvania and Ohio from around 1800. So, I do find quite a few matches where I can see where the connection comes from, where people have US dead-end ancestors where it's something like Owen Davies born 14 Mar 1831 in Wales and his wife Grace Thomas born 18 May 1834 in Wales. Since, as you note the census doesn't list the place of birth, and even if the death certificate does list the place of death and parents, many don't know that or don't know where to find it.
As noted in another thread, I download all my matches, from my now 23 tests into a database and then compare related ones against each other. Pre-Pro Tools this allowed me to pick out some much deeper matches that you couldn't get a handle on using the Shared Matches tab. If you have ten related people, they will all obviously have inherited portions of DNA that reveal links others do not. This is useful as you end up with various unknown clusters from each match. This is a quick way to show which of those clusters overlap with other matches and are thus probably relevant.
As an example, one test has a cluster of about 200 matches with 30cM being the highest. Some of those matches appeared in three other tests, but all below the 20cM threshold. And now with Pro Tools you can see that it's the same people clustering with all of them. If I look at the shared matches it's mostly people assigned to the same cluster. Conversely, when I now look at teen-level cM (13-19cM) matches that previously showed nothing under Shared Matches, I can see that many of them have no discernible common group. You can get a match who has common matches from four or so groups. In such cases I am inclined to think they are more likely a false positive. If they overlap with lots of people from the same group and you can see many people in it with shared ancestry, I would have thought it's probably not a false positive. Incoherent matches or no matches and I'd be inclined to believe it's a false positive.
I base that also on using MyHeritage, where there are people in the 40-50cM range who have no shared matches with overlapping segments and a tree where you can see no link.
-
To add to that £7.50/ month is a lot for Pro Tools. As a programmer I can say the cost of offering the Pro Tools is around £0. For this price - £90 / year - they really should add a tool (like Thru Lines) that shows any ancestor in match trees that occurs more than once, extending the trees using their grouping of suspected duplicate individuals in member trees. Invaluable. Huge time saver. Auto-clustering of some sort would also be useful. With Pro Tools I can see that there are clusters of common matches below 20cM that you can assign as reliably as ones above 20cM. Not sure if the whole cluster could be a false positive.
-
Pro Tools is a dip in and exit add on.
Subscribe then cancel and just use it for a month.
Wait a few more months and subscribe again.
As it is it has just proved useful.
My Wife has a DNA match of 171cM and we know that he is the Grandson of her Maternal Cousin but he matches Paternal DNA matches that my Wife has, and as the two sides were seperated by 300 miles there is no pedigree collapse in recent times.
Now we know that the guess work we made in his tree is in error and now we can look at the shared matches branches to see if we can find the elusive Paternal links.
So this month’s subs were worth it.