## Abstract

Expressions for the joint genotypic probabilities of two related individuals are used in many population and quantitative genetic analyses. These expressions, resting on a set of 15 probabilities of patterns of identity by descent among the four alleles at a locus carried by the relatives, are generally well known. There has been recent interest in special cases where the two individuals are both related and inbred, although there have been differences among published results. Here, we return to the original 15-probability treatment and show appropriate reductions for relatives when they are drawn from a population that itself is inbred or when the relatives have parents who are related. These results have application in affected-relative tests for linkage, and in methods for interpreting forensic genetic profiles.

## 1. Introduction

Many developments in population and quantitative genetics require knowledge of the probability that two individuals have specified genotypes. This article has been motivated in part by work that tests for linkage between disease susceptibility and marker loci on the basis of the family pedigrees (e.g. Merette and Ott 1996) or of the marker genotypes of affected relatives (e.g. Liu & Weir 2004), and partly by the need to generate matching probabilities when genetic profiles are used in forensic science (e.g. Weir 1994). There has been parallel activity in quantitative genetics concerning the covariances of relatives, especially when they are inbred (e.g. Wright 1987). This article considers the joint genotype probabilities for pairs of individuals, expressed in terms of various alternative formulations of the probabilities of identity by descent (ibd) among the four alleles they carry at a locus. It does not address the interesting problems in estimating the degree of relatedness for pairs of individuals (e.g. Ritland & Travis 2004).

Affected relative tests examine the extent of marker allele sharing for these individuals and indicate linkage between marker and disease loci when this sharing is greater than expected. Génin & Clerget-Darpoux (1996) studied the effects of inbreeding on affected sib pair tests and extended existing tests to apply to sib pairs sampled from a consanguineous population. More recently, Liu & Weir (2004) made adjustments to affected sib pair tests of Lange (1986*a*,*b*) for the case when the parents of sib pairs belong to a population in a state of equilibrium between drift and mutation. They took into account that consequent inbreeding of the sib pair elevates the degree of allele sharing.

The work of Génin & Clerget-Darpoux (1996) was based on their expressions for joint genotype probabilities for sib pairs that involved allele frequencies and the inbreeding coefficient in the parental population. They assumed that all nine condensed ibd relationships (Jacquard 1970) among the four alleles carried by two individuals could be written as functions of the inbreeding coefficient. Although this is possible in some cases, Weeks & Sinsheimer (1998) pointed out that the results of Génin & Clerget-Darpoux did not lead to known results for ibd measures. Weeks & Sinsheimer (1998) presented some affected relative tests for linkage that took account of the identity of alleles between individuals (their relationship) and within individuals (their inbreeding). Cannings (1998) also wrote that the results of Génin & Clerget-Darpoux did not correspond to ‘fully specified population model[s] (except in certain very restricted and uninteresting cases).’ Génin & Clerget-Darpoux (1998*a*,*b*) responded to the criticisms of Cannings (1998) and Weeks & Sinsheimer (1998) presented new relationships between the ibd measures and the inbreeding coefficient. However, we believe that these new expressions are still restricted, as indeed are those of Cannings (1998).

Joint genotypic probabilities are needed for the conditional probabilities of one genotype given another. In forensic science, there is a need for the probability that an unknown person (e.g. a perpetrator) has a particular genotype given that a known person (e.g. a suspect) has that genotype. Balding & Nichols (1994) derived conditional probabilities for an equilibrium population, and Weir (1994) gave more general expressions with an approximation for the pure genetic drift case.

In this article, we review known expressions for joint genotype probabilities for two individuals and discuss the special cases of relatedness with and without inbreeding. We treat the case of sib pairs in equilibrium populations when their parents are either unrelated or are first cousins.

## 2. Basic identity measures

We will retain consistency with previous notation (e.g. Weir 1994) by considering alleles *a*, *b*, *c*, *d* at an autosomal locus. These letters serve only to identify the alleles and do not specify allelic type. For locus * A*, the alleles will be written as

*A*

_{i}and the probability that an allele is of type

*A*

_{i}will be written as

*p*

_{i}. It has long been recognized that there are 15 patterns of ibd among the four alleles carried by two individuals, and these and their probabilities are shown in table 1 along with the notation of Cockerham (1971) and of Jacquard (1970). Equivalent treatments have been given by Harris (1964) and Gillios (1965).

The measures in table 1 are the most natural parameters for expressing the joint probabilities of two genotypes. If individuals *X* and *Y* carry alleles *ab* and *cd*, respectively, then adding over both orders of alleles within heterozygotes, but not over both orders of different genotypes:(2.1)

Although it may be easiest to calculate values for all 15 ibd measures, equation (2.1) shows that only nine functions of them are needed for joint genotype probabilities. These nine condensed measures are shown in table 2, again with the mnemonic notation of Cockerham (1971) and the simple notation of Jacquard (1970) and Cannings (1998). Each of the nine condensed ibd states can be described with the notation of Thompson (1974). The numbers *a*_{1}, *a*_{2}, *a*_{3}, *a*_{4} indicate the two alleles in the first individual (*ab* in our notation) followed by the two alleles in the second individual (*cd* in our notation). The numbers for ibd alleles are the same, and for the condensed set 1112 indicates the same state as 1121 for example.

Although the measures displayed in tables 1 and 2 are the ones needed for joint genotype probabilities, neither set is the natural one for evaluation for a specified degree of relationship. The more natural set lists only the alleles that are ibd to each other. The alleles for which there are ibd relationships can be identified either by how they are arranged within individuals or by the individuals from which they are randomly drawn. Nothing is implied about the ibd status of alleles not identified in these measures, unlike the case in tables 1 and 2 where alleles not identified are assumed not to be ibd to those that are identified. Identifying alleles by their arrangement within individuals *X*(*ab*) and *Y*(*cd*) yields 14 summary measures, as shown in table 3. Identifying alleles by the individuals in which they are carried, or from which they are drawn, the summary measures can be reduced to a set of eight measures:and these lead to the nine terms needed for joint genotypic probabilities:(2.2)

An alternative reparameterization was given by Karigl (1981) who identified alleles entirely by the individuals from which they were drawn. In table 4, we show the equivalences between his and Cockerham's measures for alleles drawn from one or two individuals.

## 3. Special cases

### (a) No inbreeding or relatedness

When two individuals are not related and not inbred, all four of the alleles they carry at a locus are not ibd, and all descent measures except *δ*_{0} are zero. Joint genotypic probabilities are just the appropriate products of allele probabilities with a factor of two for each heterozygote, and conditional probabilities reduce to single genotype probabilities.

### (b) No inbreeding

Non-inbred individuals do not carry ibd alleles at a locus, and the relationship between two individuals can be summarized with the probabilities that they carry zero, one or two pairs of ibd alleles. If the probabilities of these three events are written as *P*_{0}, *P*_{1}, *P*_{2}, respectively:Values of *P*_{0}, *P*_{1}, *P*_{2} for common relatives are shown below in table 7, and the joint genotype probabilities in equations (2.1) are shown in table 5. Also shown in table 5 are conditional probabilities of one genotype given another, as well as the probability of two genotypes for related individuals divided by the probability of those genotypes if they are from unrelated people. This last ratio is used in the identification of human remains, when genotypes are available from a sample thought to be from a missing person and from a family member of that person (Brenner & Weir 2003).

### (c) Random-mating population

In a population mating completely at random, including a random amount of selfing, the ibd relationships among alleles do not depend on the arrangement of those alleles among individuals. Cannings (1998) terms this situation ‘permutable’. Instead, it is necessary to identify only the number of ibd alleles: two, three, two pairs or four. The probabilities of these relationships can be written as *θ*, *γ*, *Δ* or *δ*, respectively. In other words,The nine condensed measures in this case are shown in table 6.

Cannings (1998) points out that, in general, the inbreeding coefficients *F*_{X}, *F*_{Y} differ from coancestry coefficients *θ*_{XY}. The distinction is necessary if there is not completely random union of gametes, but even for dioecious diploids, in which selfing is excluded, the difference will be very small, especially in equilibrium populations. The distinction is not needed in the permutable case, and Canning's parameterization for that case is shown in table 6. He uses three parameters, instead of the four that we use, apparently because he regards the probability of two pairs of ibd alleles as the square of the probability of one pair of ibd alleles. His *α*, *α*^{2}, *α*^{2}*β* are the same as our *θ*, *Δ*, *δ*. His *α*(1−*α*)*γ* is the same as our 2(*γ*−*δ*). Our *Δ* is the probability that two pairs of alleles are ibd, whether or not all four alleles are ibd. Therefore, *Δ* is greater than the probability *δ* that all four alleles are ibd, and it is not the product of the single pair probabilities.

In the special case when the mutation process does not depend on allelic state, then allele frequencies have a Dirichlet distribution over replicate populations (Balding & Nichols 1994). This providesand the nine condensed ibd probabilities reduce to the values shown in table 6. They are all functions of the single parameter *θ*. Génin & Clerget-Darpoux (1998*b*) also presented a single-parameter formulation (table 6). Their *α* is the same as our *θ*, and they assign *α*^{2} to both of our *γ* and *Δ*, and they assign *α*^{3} to our *δ*.

The joint genotypic probabilities in the Dirichlet case areThese equations may be derived directly from Ewens' sampling formula: the probability of allele *A*_{i} given that *n*_{i} copies of that allele have already been found in a set of *n* alleles, is

The match probability in forensic science is the probability that an unknown person has a particular genotype given that a known person has been found to have that genotype (Balding & Nichols 1994; Weir 1994). In the Dirichlet case, these match probabilities areIt is of forensic significance that these match probabilities are greater than the genotype probabilities Pr(*A*_{i}*A*_{i}) and Pr(*A*_{i}*A*_{j}).

Weir (1994) gave alternative relationships between *θ*, *γ*, *Δ*, *δ* for the pure drift case. His approximations were based on the transition equations for the four measures over time.

### (d) Relatives in a random-mating population

When relatives are taken from a population with a common background level of relatedness, account must be taken of the fact that they will be inbred. The simplest situation is when the population is in equilibrium with respect to ibd status of random sets of alleles. The four alleles in the two relatives in question are traced back to the most recent common ancestors of those individuals, where they may be copies of four or fewer alleles in the ancestors. The relationships among those alleles are governed by the population-wide measures *θ*, *γ*, *Δ*, *δ* regardless of which generation the ancestors belong to. Because alleles are being traced back from relatives to common ancestors, it is much easier to work with the summary measures. The values are shown in table 7. Weir (1994) showed the more complicated expressions for the original 15 measures.

### (e) Sibs from related parents

As a final special case, suppose two full sibs have parents who are related by family membership and by membership of an equilibrium population. For example, suppose two sibs have parents *X*(*ab*) and *Y*(*cd*) who are cousins. This case was considered by Génin & Clerget-Darpoux (1996, 1998*a*,*b*) and by Cannings (1998). The relationships among the four parental alleles are given in table 7, and in particular:If the sibs are *G*(*xy*) and *H*(*zw*), where alleles *x*, *z* are from cousin *X* and alleles *y*, *w* are from cousin *Y*, then the summary measures for these alleles areThe nine condensed probabilities needed for the joint genotype probabilities areThese are the same as those of Génin & Clerget-Darpoux (1998*a*) under their assumption that *γ*=*Δ*=*θ*^{2}, *δ*=*θ*^{3} and, therefore, in the case that *θ*=*γ*=*Δ*=*δ*=0. Otherwise, they are slightly different, and they do not have the pattern that suggested to Génin & Clerget-Darpoux (1996) that some of them were equal and could be pooled.

In their discussion of affected-sib tests, Génin & Clerget-Darpoux (1998*a*) gave the probabilities that two sibs whose parents were cousins share 0, 1 or two pairs of alleles ibd. These three probabilities, using our results, areIn the Dirichlet case, these reduce to

## 4. Discussion

Génin & Clerget-Darpoux (1996) pointed out that affected sib pair tests of linkage (Blackwelder & Elston 1985) can have higher than expected type I errors when ‘remote consanguinity’ is ignored. They were referring to the low level of inbreeding and relatedness in actual populations caused by the evolutionary process, primarily drift. In this article, we quantify the degree of inbreeding for relatives taken from such populations, and show how our results differ from those of Génin & Clerget-Darpoux (1996, 1998*a*,*b*) and Cannings (1998). The degree of inbreeding of relatives is further increased if their parents were related, first cousins for example, and our treatment allows those cases also to be considered.

Although complete treatments of joint genotype probabilities require knowledge of 15 different patterns of ibd for four alleles, this is too cumbersome an approach in general. Assumptions of random mating and ‘permutability’ can reduce this system down to just four probabilities, but estimation procedures are in place for only the two-allele probability (Weir & Hill 2002). There is great appeal, therefore, in assuming that allele frequencies follow a Dirichlet distribution so that only the two-allele measure is needed. There are sound reasons for adopting this distribution for the infinite-alleles mutation model (Balding & Nichols 1994), but some slight ground for concern for stepwise mutation models (Graham *et al*. 2000). Even for microsatellite markers, however, where the stepwise model may have relevance, some empirical evidence on matching probabilities suggests that results based on the Dirichlet distribution can be used with confidence. As we have already done (Liu & Weir 2004) with the affected sib pair tests of Lange (1986*a*,*b*), we suggest that tests of the type proposed by Génin & Clerget-Darpoux (1996) be developed for inbred affected relatives and that these tests be constructed on the basis of joint genotype probabilities expressed in terms of allele frequencies and the coancestry coefficient *θ*.

## Acknowledgments

This work was supported in part by NIH grant GM 45344. Helpful comments were made by Dr Gary Chase.

This work is dedicated to Bill Hill in recognition of 25 years of friendship and collaboration.

## Footnotes

↵† Present address: Department of Health Evaluation Sciences, Penn State College of Medicine, A210., Suite 2200, 600 Centerview Dr., Hershey, PA 17033-0855, USA.

One contribution of 16 to a Theme Issue ‘Population genetics, quantitative genetics and animal improvement: papers in honour of William (Bill) Hill’.

- © 2005 The Royal Society