The Geography of Recent Genetic Ancestry across Europe.

Supplemental figures S7S13

Figure S7: Correlations in IBD rates, for six different length windows (omitted length windows are similar). If there are nn populations, I(x,y)I(x,y) is the mean number of blocks in the given length range shared by a pair from populations xx and yy, and I¯(x)=(1/(n-1))zxI(x,z)\bar{I}(x)=(1/(n-1))\sum_{{z\neq x}}I(x,z), shown is (1/(n-2))z{x,y}(I(x,z)-I¯(x))(I(y,z)-I¯(y))(1/(n-2))\sum_{{z\notin\{x,y\}}}(I(x,z)-\bar{I}(x))(I(y,z)-\bar{I}(y)).
Figure S8: The same plot as figure 3G–I, but rendered as an SVG figure with tooltips that allow identification of individual points (using R 55) – open the file in a reasonably compliant browser (e.g. Firefox, Opera) or SVG browser (e.g. squiggle) and hover the mouse over a point of interest to see the label.
Figure S9: Mean IBS (“Identity by State”) against geographic distance, calculated using plink (60) as described in the main text, using the same groups and fitting the same curves as in figure 3 of the main text. The lowest set of points, roughly following a line, are mean IBS with Turkey; unlike with IBD, mean IBS with Cyprus was significantly higher. In fact, the other rough line of points (between the comparisons to Turkey and the orange points) is almost entirely mean IBS with Cyprus, as well as mean IBS to Slovakia. Since Slovakia is only represented by a single individual in the dataset, we cannot reach further conclusions.
Figure S10: Goodness-of-fit for our estimated error distribution – points show data from simulations (described in the text), and lines show the parametric forms of equations (1). Each simulated IBD block of length xx was either found by BEAGLE (and passed our filters) or was not; and if it was found, it had inferred length y=x+ϵy=x+\epsilon, i.e. with length error ϵ\epsilon. The top figure shows the probability that a segment of a given length is missed entirely (and 1-c(x)1-c(x)) in green, the probability that ϵ>0\epsilon>0 given the segment was found (and γ(x)\gamma(x)) in black, and the probability that ϵ0\epsilon\leq 0 given the segment was found (and 1-γ(x)1-\gamma(x)) in red. The second figure shows the probability density of all positive ϵ\epsilon (in black, with λ+(x)\lambda_{+}(x)), and probability densities of positive ϵ\epsilon for various categories of true length xx (colors). The third figure is similar to the second, except that it shows negative ϵ\epsilon. Note that blocks with inferred length y<1y<1 were omitted.
Figure S11: Estimated false positive rates per pair, compared to the observed rate, as a function of block length. The black dotted curves show the mean number of IBD blocks per pair observed in the false positive simulations (see section 4.2 of the main text), per centiMorgan, binned at 0, 1.2, 1.5, 1.8, 2.1, 2.4, 2.7, 3.2, 4.5, and 7.5cM, and the parametric fit described in the text. The colored curves show the same quantity, separately for each pair of country comparisons, with the extreme values labeled. No comparisons other than Portugal–Portugal show any significant deviations from the parametric fit above 2cM. For comparison, the black solid curve shows the mean observed IBD rate across the same set of individuals; note that e.g. the false positive rate pairs of Portuguese individuals is higher than this at short lengths because the observed IBD rate between Portuguese at short block lengths is much higher than the overall mean.
Figure S12: Estimated total numbers of genetic common ancestors shared by various pairs of populations, in roughly the time periods 0–500ya, 500–1500ya, 1500–2500ya, and 2500–4300ya. The population groupings are: “AL”, Albanian speakers (Albania and Kosovo); “S-C”, Serbo-Croatian speakers in Bosnia, Croatia, Serbia, Montenegro, and Yugoslavia; “R-B”, Romania and Bulgaria; “UK”, United Kingdom, England, Scotland, Wales; “Iber”, Spain and Portugal; “Bel”, Belgium and the Netherlands; “Bal”, Latvia, Finland, Sweden, Norway, and Denmark; and denotes a single population with the same abbreviations as in table 1 otherwise.
Figure S13: For those who are used to thinking in effective population sizes, the equivalent figure to figure S12, except with coalescent rate on the vertical axis, rather than numbers of most recent genetic common ancestors.