The Geography of Recent Genetic Ancestry across Europe.

Supplemental figures S1S6

Figure S1: Normalized density of IBD blocks of different lengths, corrected for SNP density, across all autosomes (see section 4.3 for details). Marked with a grey bar and “c” are the centromeres; and marked with “8p” is a large, segregating inversion (17). The grey curve along the bottom shows normalized SNP density.
Figure S2: Two measures of overdispersal of block numbers across individuals (i.e. substructure): Suppose we have nn individuals from population xx, and NiyN_{{iy}} is the number of IBD blocks of length at least 1cM that individual ii shares with anyone from population yy. Our statistic of substructure within xx with respect to yy is the variance of these numbers, sxy=1n-1(iNiy2-1n(iNiy)2)s_{{xy}}=\frac{1}{n-1}\left(\sum_{i}N_{{iy}}^{2}-\frac{1}{n}\left(\sum_{i}N_{{% iy}}\right)^{2}\right). We obtained a “null” distribution for this statistic by randomly reassigning all blocks shared between xx and yy to an individual from xx, and used this to evaluate the strength and the statistical significance of this substructure. (A) Histogram of the “pp-value”, of the proportion of 1000 replicates that showed a variance greater than or equal to the observed variance sxys_{{xy}}, for all pairs of populations xx and yy with at least 10 individuals in population yy. (B) The “zz score”, which is observed value sxys_{{xy}} minus mean value divided by standard deviation, estimated using 1000 replicates. The population xx is shown on the vertical axis, with text labels giving yy, so for instance, Italians show much more substructure with most other populations than do Irish. Note that sample size still has a large effect – it is easier to see substructure with respect to the Swiss French (x=x=CHf) because the large number of Swiss French samples allows greater resolution. A vertical line is shown at z=5z=5. Only pairs of populations with at least 3 samples in country xx and 10 samples in country yy are shown. Because of the log scale, only pairs with a positive zz score are shown, but no comparisons had z<-2.5z<-2.5, and only three had z<-2z<-2.
Figure S3: (A) Mean numbers of IBD blocks of length at least 1cM per pair of individuals, shown as a modified Cleveland dotchart, with ±\pm2 standard deviations shown as horizontal lines. For instance, on the bottom row we see that someone from the UK shares on average about one IBD block with someone else from the UK and slightly less than 0.2 blocks with someone from Turkey. Note that in most cases, the distribution of block numbers is fairly concentrated, and that nearby populations show quite similar patterns.
Figure S4: The positions of our sample of the first two principal components of the genotype matrix, as produced by EIGENSTRAT (58). Population centroids are marked by text and a transparent circle. Note the correspondence to a map of Europe, after a rotation and flip.
Figure S5: Comparison of figure 2A in the main text to figure S4 – the axes are self-explanatory; the colors and symbols are the same as in figure 2A.
Figure S6: Comparison of figure 2B in the main text to figure S4 – the axes are self-explanatory; the colors and symbols are the same as in figure 2B. The four outlying UK individuals are, as in figure 2B, three who share a very high number of IBD blocks with Italians, and one who shares a very high number with the Slovakian sample.