Institutional Login Shibboleth or OpenAthens For the academic login, please select your organization on the next page. Forgot Password? Sign up for MyKarger Institutional Login. Related Articles for " ". Cytogenet Genome Res — To view the fulltext, please log in. To view the pdf, please log in. CHF Subscribe Access to all articles of the subscribed year s guaranteed for 5 years Unlimited re-access via Subscriber Login or MyKarger Unrestricted printing, no saving restrictions for personal use You receive your access authorization within a few days; please copy this article reference into your MySelection.
Abstract Analysis of the horse genome is proceeding at a rapid pace. Anim Genet — First-Page Preview. Twenty-one of the remaining 39 marker pairs that disagree with EquCab2 agree with EquCab3, while 18 do not have both pairs mapping to EquCab3. Given the multiple, orthogonal data types and differing assembly strategies used to construct EquCab2 and EquCab3, we suggest that some or all of these marker pairs are oriented correctly in both assemblies, but incorrectly on the radiation hybrid map.
Of the remaining 43 marker pairs that are misoriented on EquCab3 but not on EquCab2, 36 of these pairs do not have both markers mapping to EquCab2, leaving only seven marker pairs agreeing with EquCab2 but not EquCab3. Given that the radiation hybrid map was used to guide the assembly of EquCab2, we find this level of disagreement acceptable.
We used two methods to evaluate the completeness of our genome: universal ortholog analysis and comparative annotation. Comparative Annotation Toolkit CAT is a software pipeline that leverages whole-genome alignments, existing annotations, and comparative gene prediction tools to simultaneously annotate multiple genomes, defining orthologous relationships and discovering gene family expansion and contraction CAT also diagnoses assembly quality by investigating the rate of gene model-breaking indels seen in transcript projections from a reference, as well as looking at the rate of transcript projections that map in a disjointed fashion.
We performed comparative annotation of EquCab2 and EquCab3 using the genomes of pig, cow, white rhinoceros, elephant, and human.
These results indicate that EquCab3 is a more complete and contiguous assembly than EquCab2. Most published assemblies of diploid organisms are pseudo-haploidizations produced by arbitrarily choosing between the two alleles at each heterozygous site in the genome. For each phase block inferred by longranger, rather than arbitrarily choosing which haplotype to include in the final assembly, we chose the allele which is most common among four Thoroughbreds, the two FAANG horses, and data from two other Thoroughbreds from an earlier study by Sarkar et al.
This new genome represents an improvement for the horse reference in terms of both composition and contiguity.
Going forward, the lens through which this reference will be viewed will be as an alignment target for the vast amount of high-throughput sequence data that will continue to be generated for the horse and other related species. All equine data produced by any of these technologies should be well served going forward. Illumina short reads, currently the most common data types for genetic and genomic studies, for two Thoroughbreds not related to Twilight or each other have been demonstrated to map to the new reference at an average rate of In a comparative genomics analysis, more gene orthologs were found, and for those that were found, the coverage of the homologous transcript sequence was more complete.
The new long-range sequence data not only improved the contiguity of the genome, but allowed us to phase the genomic data for Twilight. Finally, the regions added for the genome were higher in GC content, which will enable a better characterization of both genetic variation and epigenetic status in GC-rich regulatory regions for the horse. This represents a culmination of a project conceived and begun in with the support of the equine genomics community. Although it will certainly not be the last reference genome for the domestic horse produced for public annotation, it should foster genetic and genomic discoveries for years to come.
Sample collection and DNA extraction : A single reference horse was used for this study. Venous jugular whole-blood samples were collected from Twilight into evacuated tubes containing heparin. In compliance with Trace Archive rules, individual shell scripts were executed to download the maximum 30, records per search request. This is described in more detail in Rebolledo-Mendez et al The fragment sizes were confirmed by measuring the distribution of insert sizes in the mapped MiSeq dataset. The fastq read files were generated with the bcl2fastq v1. For the size selection, the sample was run on a 0.
The remainder of the reads were used to generate an error corrected subreads file using canu 28 version 1. These two datasets were used in the PBJelly runs described below. Hi-C library : We generated a Hi-C library with primary fibroblasts from Twilight using a Hi-C protocol modified such that the chromatin immobilization took place on magnetic beads. We crosslinked the fibroblasts in formaldehyde, and lysed, washed, and resuspended as described by Lieberman-Aiden et al.
The data were analyzed and assembled using the 10x Genomics Supernova version 1. The total length of sequences assembled into super reads was 4.
These super read sequences had a contig N50 of nucleotides. Celera Assembler : The Celera Assembler 22 , 31 , 32 , version 8. Identifying misassemblies : In order to identify misassemblies in the HiRise assembly relative to EquCab2, we aligned the HiRise output scaffolds to EquCab2 using nucmer with default parameters In every place where the alignment indicated a difference in order and orientation of scaffolds between the two assemblies, we used every available data type to resolve the discrepancy and determine which was correct.
Our strategies included aligning BAC-end pairs from a half-brother of Twilight 2 to the assemblies using bwa mem with default parameters 34 , assessing concordance with the physical map, looking for split genes predicted by the CAT 27 , aligning coding sequences of any genes in the region to the assemblies using gmap with default parameters 35 , and examining heatmaps of long-range read pairs mapping to the assembly generated by the HiRise and longranger pipelines Assigning scaffolds to chromosomes : We used a previously published radiation hybrid map 20 to assign scaffolds to chromosomes.
Mitochondrial assembly : Illumina data were adapter trimmed using SeqPrep2 A subset of 24 million randomly selected Illumina reads were created using seqtk sample The subsetted Illumina reads were used as input into an iterative assembler mia version 1. Sanger data were used to determine the correct number of 8-mer repeats in the control region Sanger reads were trimmed using Figaro version 1.
Alignments were manually inspected by eye using IGV version 2. Sanger reads that aligned to the control region were extracted, visualized by eye, and compared to the initial mitochondrial assembly. One Sanger read spanned both sides of the control region 8-mer repeats and was used to update the number of 8-mer repeats in the mitochondrial assembly. The updated mitochondrial assembly sequence was used as the reference sequence for an assembly with mia, using 40 million randomly selected Illumina reads, with a slope of and intercept of These data can be found at the Sequence Read Archive. SAMtools 45 version 0.
Picard version 1. The differing bases were likely contributions from the sequence data generated on other platforms used for the assembly such as the Sanger or PacBio data. To evaluate these positions, we performed variant discovery and genotyping with the UnifiedGenotyper using the Twilight PE data, the two FAANG thoroughbreds, and two additional thoroughbreds from Sarkar et al.
The UnifiedGenotyper was used in discovery mode on the cohort. The resulting variant call format file was then parsed with custom java software 48 looking for positions at which the Twilight data produced a homozygous genotype differing from the reference. The genotypes for the other animals were then queried at those positions. If the reference allele was detected in one of the other horses, the reference nucleotide at that position was not changed, with the idea that the second allele was either undersampled in the Illumina dataset or that a second allele was identified in the Sanger or PacBio sequence data.
Removal of microbial contamination : To build microbial sequence databases, all bacterial, viral, and fungal reference genomes were downloaded from RefSeq. For each of the three databases bacteria, viruses, and fungi , the sequences were first masked with DustMasker Kraken v1. Contigs with at least one exact mer match were considered microbial contaminants and removed from the reference sequence. A total of 41 contigs were removed in this way. Removal of small contigs : All scaffolds smaller than bases in length were removed from the assembly that was submitted for annotation.
The contig and scaffold N50s for what was submitted were 4. The phased variant file produced was then used to modify individual variant positions to conform to the haplotype whose allele was most common among the FAANG horses, and two other thoroughbreds described above.
Equine Genetics and Genomics Laboratory
The default setting of 25 was used for the minimum gap setting. Comparative annotation : For this analysis, a progressiveCactus 51 alignment of EquCab2 and EquCab3 was performed with pig susScr3 , cattle bosTau8 , white rhinoceros cerSim1 , elephant loxAfr3 , and human hg The guide tree was Human CAT 27 was then run using the Ensembl V89 annotation of pig as the source transcript set. No RNA-seq data were provided, so no transcript cleanup steps or comparative gene predictions were performed.
Read filtering and counting : Mapping locations that were not the primary mapping locations of reads were filtered with getNotPrimaryAlignmentFlag is false within the mapped read getReadUnmappedFlag is false count using htsjdk version 2.
- Navigation menu.
- A Quick Start Guide to Traveling With Your Dog (Dog Insider Series Book 2).
- Through the Rainbow: Varying Degrees of Love.
- Origine du nom de famille REDON (French Edition).
For normalization across samples, fastq files were downsampled to 6M reads using seqtk Low complexity sequences were removed using PRINSEQ 54 following bwa mapping 36 with parameters optimized for aDNA: aln algorithm, seed disable flag, and minimum mapping phred quality of The mitochondrial sequence has been deposited into GenBank accession number MH Data from Sarkar et al. Outram, A. The earliest horse harnessing and milking. Science , — Wade, C. Genome sequence, comparative analysis, and population genetics of the domestic horse.
Coleman, S. Structural annotation of equine protein-coding genes determined by mRNA sequencing. Vanderman, K. Schaefer, R. BMC Genom. Petersen, J. Genome-wide analysis reveals selection for important traits in domestic horse breeds. PLoS Genet. McCue, M. A high density SNP array for the domestic horse and extant Perissodactyla: utility for association mapping, genetic diversity, and phylogeny studies. Bellone, R. Pleiotropic effects of pigmentation genes in horses. Evidence for a retroviral insertion in TRPM1 as the cause of congenital stationary night blindness and leopard complex spotting in the horse.
Brooks, S. Staiger, E.
Annotation of the Protein Coding Regions of the Equine Genome
Host genetic influence on papillomavirus-induced tumors in the horse. Cancer , — Sarkar, S. A missense mutation in damage-specific DNA binding protein 2 is a genetic risk factor for limbal squamous cell carcinoma in horses. Gaunitz, C. Schubert, M. Prehistoric genomes reveal the genetic foundation and cost of horse domestication.
Natl Acad. USA , E—E Librado, P. Tracking the origins of Yakutian horses and the genetic basis for their fast adaptation to subarctic environments. Ancient genomic changes associated with domestication of the horse. Rebolledo-Mendez, J. Comparison of the equine reference sequence with its sanger source data and new illumina reads. Hestand, M. Annotation of the protein coding regions of the equine genome. Raudsepp, T.
A 4, marker integrated physical and comparative map of the horse genome. Genome Res. Zimin, A. Bioinformatics 29 , — Miller, J. Aggressive assembly of pyrosequencing reads with mates.
Do you have an account?
Bioinformatics 24 , — Putnam, N. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Marks, P. Burns, E. Generation of an Equine Biobank to be used for functional annotation of animal genomes project. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.
Bioinformatics 31 , — Fiddes, I. Berlin, K. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Lieberman-Aiden, E. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Deng, X. Bipartite structure of the inactive mouse X chromosome. Genome Biol.
Researchers Define Genomics Applications in the Horse Industry
Myers, E. A whole-genome assembly of Drosophila. Koren, S. Reducing assembly complexity of microbial genomes with single-molecule sequencing. Kurtz, S. Versatile and open software for comparing large genomes. Li, H. GN] , v2 Wu, T. Bioinformatics 21 , — Fast and accurate short read alignment with Burrows—Wheeler transform.