Why the X and Y don’t match

Because X-Y heteromorphism allowed for the sex chromosomes to be followed through meiotic division, it was the first marker to be correlated with the segregation and inheritance of a phenotype (sex). Still today, molecular differences between X and Y are used to identify the sex chromosomes in genetic sequencing studies. Because of its pivotal role in the study of evolutionary genetics, the evolutionary source of this diagnostic heteromorphism has been subject to intense study. The problem is all the more interesting as the evolution between X and Y seems to be asymmetrical: the Y chromosome disproportionately loses genes compared to the X. Here I review the study of the divergence and heteromorphism between X and Y chromosomes.

As first noted by Morgan, recombination rates between the X and the Y chromosomes are unusually low compared to the rest of the genome. The lack of recombination between the X and Y chromosomes can lead to heteromorphism because new mutations are not exchanged between the pair, which allows the X and Y chromosomes to diverge over time. The divergence between the X and the Y, caused by the loss of recombination, leads to an absence of coalescence analogous to evolution in separate species. Indeed, the methods used to estimate divergence times between related species have been repurposed to estimate the time since recombination arrest between the X and Y. Using these techniques, molecular heteromorphism between X and Y can be quantified. In some plants such as kiwifruit, melon, willow, papaya, date palm and strawberry, genetic divergence is limited, while in others such as Silene, Cannabis, and some loci in Rumex the level of divergence is remarkable. Extreme heterogeneity in sex chromosome differentiation has been found between sister species of livebearers whereas X-Y differentiation is polymorphic and correlated to geography in frogs, killifish and stickleback. In some cases, divergence is so pronounced that it leads to X-Y heteromorphism, the so called ‘heteromorphic’ as opposed to ‘homomorphic’ sex chromosomes often reported in the literature. Whether at the chromosome-scale or in terms of molecular evolution, divergence between X and Y is a quintessential feature of sex chromosome evolution.

Neutral divergence between X and Y is likely to participate to some extent in X-Y heteromorphism, but cannot alone account for one of the most striking aspects of the sex chromosomes: Y chromosomes have a general dearth of genes compared to the X. Even in very early studies, the absence of genes on the Y was attributed to a gradual degeneration over evolutionary time. First, surveys of sex chromosome karyotypes in the 20th century revealed a stunning variety of sex chromosomes; while some species had homomorphic sex chromosomes, other species completely lacked a Y chromosome. By envisioning this variation in sex chromosome heteromorphism as an evolutionary trajectory, it seemed plausible that Y chromosomes started as regular chromosomes (autosomes) but that some process caused gene loss and occasionally resulted in the loss of the entire Y chromosome. Second, because of the obvious cost of unmasking of X-linked recessive lethals only in males, the emptying of the Y seemed more likely to be a degenerative process than one involving optimization driven directly by natural selection. This led to the hypothesis that the Y chromosome may inevitably degenerate over time.

With the advent of large-scale genome sequencing, the Y was often found to be highly degenerate as expected. Beyond missing genes, Y chromosomes had an over-abundance of loci where the protein product was truncated, known as pseudo-genes. Y chromosomes were found to have very high rates of evolution at non-synonymous sites compared to synonymous sites: e.g. in apes, Drosophila, stickleback, birds, clam shrimp and plants including Silene, Rumex and papaya. Assuming synonymous sites evolve neutrally and non-synonymous substitutions are under selection, this is likely to be a signal of pronounced degeneration. As techniques were refined, more signals of degeneration appeared: Y chromosomes were found to use less favoured codons and to have lowered levels of gene expression in plants, mammals, Drosophila, butterflies and stickleback. A young Y chromosome in Drosophila miranda shows a reduction in adaptation compared to the X and an accumulation of deleterious mutations, including large deletions, premature stop codons and frameshift mutations. Y chromosome polymorphism even contributes epistatically to male fitness reduction in Drosophila, but not in frogs, suggesting degeneration can arise from a lowered efficacy of selection rather than as a completely neutral process.

The Y chromosome has also been shown to have an over-abundance of transposable elements. Since they are considered genomic parasites, this is further evidence for deleterious mutation accumulation on the Y. Transposable elements invaded early in sex chromosome evolution in fishes and birds. Microsatellites expanded on the Y chromosomes of Rumex, Hippophae seabuckthorn, Mercurialis annua and in the mostly haploid liverwort. Retrotransposons proliferate on the Y in Silene. Overall, many lines of evidence suggest Y chromosomes degenerate over time.

Alongside the progress in characterizing Y chromosome degeneration, the debate about what caused this degeneration continued. H.J. Muller proposed that Y degeneration resulted because selection was not able to act effectively on the Y chromosome and, later, suggested that selection was not effective on Y chromosomes because Ys were always masked by X chromosomes. Assuming gene loss had a recessive effect on fitness, the ever-heterozygous XY condition allowed gene loss because the X alleles could mask the losses on the Y. However, this model was shown to be ineffective because it required a higher rate of fixation of new mutation on the Y over the X and thus dominance effects alone were unlikely to account for Y chromosome degeneration. That low rates of recombination may be implicated in degeneration arose from comparisons with studies of the relative inefficacy of selection in organisms that replicate clonally or otherwise asexually. Indeed, as shown by Muller and many studies since, asexual lineages are more often less fit than sexual lineages. Because Ys do not recombine, they are likely to follow similar evolutionary trajectories as asexual lineages. 

The incorporation of finite population sizes, and the influence of chance associations, into models of the evolution of sex, known as linkage interference, was hugely successful in understanding the effects of asexuality on the efficacy of selection. The first model considering the association of recombination loss and fitness in finite populations was proposed by Muller in 1964. Muller proposed that disadvantageous mutations accumulated in asexuals by the chance loss from the population of all the haplotypes with the smallest number of deleterious mutations (the ‘least loaded class’). In the case of Y chromosomes, this model would predict that the fittest Y haplotypes in a population were eventually be lost by chance and would never be recovered. With each loss of the best Y haplotypes, the mean fitness of all Y haplotypes in the population falls further behind the mean fitness of Xs. This process became known as ‘Muller’s ratchet’. Another suite of processes involving chance events affecting recombination in finite populations could also cause degeneration. The random addition of alleles into a population by mutation, or their loss by drift, causes correlations between alleles in tightly linked regions. Without recombination, selection acts on blocks of alleles, rather than having the resolution allowed by recombination to act on each allele individually. Indeed, in 1966, Hill and Robertson used Monte Carlo computer simulations to show that selection was less effective than expected when the effects of random associations between alleles under selection were considered, and these results have been replicated since. This process, known as ‘linkage interference’ or the ‘Hill-Robertson effect’, is likely to be a powerful force in slowing the efficacy of selection in regions of low recombination.

The power of linkage interference on fitness is most pronounced under a molecular evolution framework devised by T. Ohta in the 70s known as the ‘nearly neutral’ model. Under the nearly neutral model, most alleles reside on the border between being affected by selection or by drift when the product of the selection coefficient (s) and the effect of drift (Ne) is about 1. With an Ne⋅ s < 1, selection cannot overcome the effects of drift. The action of linkage interference can be said to further locally reduce the effective population size of a specific genomic region. The local reduction in Ne pushes the nearly neutral variation into the zone where it is affected solely by drift (Ne⋅ s < 1) reducing the efficacy of selection on nearly neutral variants. If most new mutations are slightly deleterious, as predicted by the nearly neutral model, linked sites under strong selection can cause these weakly deleterious alleles to spread and to fix. The removal of very deleterious alleles, known as ‘background selection’ or the spread of very beneficial alleles, known as ‘selective sweeps’ thus affects the likelihood of fixation of nearly neutral linked variants. The effects of linked selection increase as recombination rate decreases simply because more sites are subject to selection on nearby sites.

The predictions for effects of selection at linked sites have been supported by empirical evidence. Measures of genetic variation around selected sites seem to be well explained by the effects of linkage interference and correlate with recombination rate at the genomic scale. Empirical assessment of the distribution of fitness effects (DFE) of new mutations finds strong evidence in support of the nearly neutral model, suggesting linked selection is likely to often cause the fixation of slightly deleterious alleles at the genomic level. For example, regions of low recombination have a higher genetic load in Drosophila melanogaster and maize, and they also show fewer adaptive substitutions in Drosophila melanogaster. Other non-recombining regions have also been noted to have increased non-adaptive substitutions such as the mating-type locus of Chlamydomonas reinhardtii and Microbotryum anther-smut fungus, the self-incompatibility locus of Arabidopsis, the gene-complex involved in colony organization in Solenopsis fire ants, and the morph gene-complexes in Heliconius butterflies and sparrows. Linked selection may even explain patterns of codon bias across single genes and is likely to contribute to degeneration of genes during cancer progression. Interference selection seems a process universal to regions of low or no recombination, and, in many species, the Y chromosome is the largest non-recombining region of the genome.

Molecular studies of Y chromosomes lend credence to the prediction that alleles on the Y are degenerating because of linked selection. The non-recombining sex chromosome indeed shows dramatically reduced levels of genetic diversity in birds and even on regions recently translocated to sex chromosomes in Drosophila. Model fitting and simulations suggest the reduction in genetic diversity on the Y can be effectively explained by background selection rather than invoking positive selection in Drosophila, humans and Rumex. Experimentally reduced rates of recombination across a synthetic Y chromosome in Drosophila also reduced the efficacy of selection. These results suggest linked selection played a significant role in the evolution and degeneration of Y chromosomes.

Similar to coding sequence degeneration, lowering and loss of gene expression seem to be common features of Y chromosome evolution. However how lowered gene expression interacts with linkage interference remains unclear and several hypotheses have been proposed. First, Y expression degeneration may simply be a direct symptom of linked selection. Under this hypothesis, Y alleles loose expression as their enhancers and promoters degenerate from the fixation of deleterious variants as allowed by the less efficient selection on the Y. Consistent with this hypothesis, regulatory regions may be under weak purifying selection and therefore are likely to degenerate faster than genic regions. Gene expression loss would then proceed at the same rate as coding sequence degeneration.

If a Y allele loses expression, selection will no longer be able to act on that allele as the allele will be completely recessive. Gene expression loss can thus allow the Y to completely degenerate and even be lost. This process may be analogous to a reduction in the efficacy of selection with reduced dominance. In support of this hypothesis, chromosome-wide gene silencing precedes Y degeneration in Drosophila albomicans. If Y allele expression loss occurs early in degeneration, coding sequence degeneration may be a neutral side effect of gene expression loss. 

The association between dominance, gene expression and linked interference is in line with Haldane’s hypothesis that selection during the haploid stage (e.g. pollen) could slow degeneration of the Y. Slowed decay of the Y due to pollen or ovule expressed genes, known as haploid selection, may be able to account for the observation that X-Y heteromorphism in dioecious angiosperms is not especially common, occurring in only four families. The effect of haploid selection may be substantial when we consider that in plants around ~60% of genes are expressed in pollen, the male haploid phase. As expected, pollen-expressed Y-linked genes have been shown to degenerate more slowly in Silene and Rumex than other Y-linked genes. Sex chromosomes in organisms with predominantly haploid lifecycles similarly are less degenerate, such as in the brown alga Ectocarpus and the liverworts but not mosses. Sex chromosome sequence involved in the haploid phase of animals are also highly constrained on the sex chromosomes in mammals, while the pattern is more complex in birds, potentially due to female heterogamety. Selection in the haploid-phase may therefore play an important part in slowing Y chromosome degeneration.

Because of the lowered chance of the fixation of beneficial alleles on the Y compared to the X due to stronger linked selection, Orr and Kim proposed that it is beneficial to silence the Y because its alleles are less likely to be well adapted than those on the X. In support of this hypothesis, Orr and Kim estimated that in Drosophila the greatest difference in fitness between the X and the Y is caused by differences in the fixation of beneficial mutations rather than of deleterious ones. Similarly, it may be advantageous to the genome to silence sex ratio distorters or other selfish genetic elements such as TEs that accumulate on the Y due to a lowered efficacy of selection. Under either of these ‘active-silencing’ models, Y chromosome degeneration may occur in part because of the accidental silencing of nearby genes, a process likely to be associated with methylation. Indeed, heterochromatin is known to be imperfect in its silencing, through a phenomenon known as ‘position-effect variegation’. There is evidence of position-effect variegation playing a role in Y chromosome evolution in Drosophila where it plays a role in sex-specific aging. Silencing of the Y may therefore be a process favoured by natural selection, and Y coding sequence degeneration may be a neutral process following silencing.

Overall, X and Y divergence is a common aspect of sex chromosome evolution. In some cases, X-Y divergence can lead to gross, cytological heteromorphism. A significant parameter likely to be involved in dictating the trajectory towards heteromorphism is the number of selected sites in the region within which X and Y recombination is absent. The effect of linkage interference on the Y is determined by the number of linked sites, and the effect of linkage interference on molecular degeneration of Y alleles is well supported by empirical evidence. The role in degeneration of gene expression loss of Y-alleles remain less clear. The loss of expression may be an active process or a by-product of degeneration, but its impact on sequence evolution is important. The effects of pollen expression may be crucial in disentangling the relative contributions of expression loss and selection interference to Y degeneration, and to X-Y divergence and heteromorphism more broadly.

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s