Tasmanian Devil Genome Project: Our Research

Our two laboratories (Schuster and Miller) began working together in November, 2005, and were the first to apply so-called next-generation sequencing methods to an extinct species [1]. (Numbers in square brackets refer to the publications listed below, which can be fetched by clicking on the number in the citation.) We developed methods to extract DNA sequences from hair shafts [2], an approach that is important for sequencing museum specimens without adverse visible effects. Our study of entire mitochondrial sequences from a number of woolly mammoths [3] remains one of the most extensive analyses of its kind for any species, living or extinct. The natural cumination of this work on the woolly mammoth was to sequence its entire genome [4]. We have continued to sharpen our skills at sequencing museum specimens, including publications on the woolly rhino [5], the Tasmanian tiger [6] (a relative of the Tasmanian devil that went extinct in 1936), and an ancient polar bear [7].

An important aspect of our work has been to develop novel computational methods that are applicable to species conservation. A method that we developed for the Tasmanian devil addresses problems that arose becase the species is quite different from any previously sequenced species. For that reason, there was no pre-existing "reference genome assembly" to which the new sequences could be compared. Even when we had only very small amounts of sequence data from Cedric and Spirit, rougly equivalent to single-fold coverage (i.e., when the total lengths of all obtained sequence fragments approximately equaled the genome's length), we were able to identify SNPs (single-nucleotide polymorphisms) that served as the basis for the population-structure analysis reported in the paper. We published the computer programs [8] to make them available for other projects.

Knowing that a number of individual Tasmanian devils were going to be genotyped for a large number of SNPs (i.e., the nucleotides present at those genomic locations in each animal would be determined), we developed computer programs that could use such data to select individuals for a founder population [9]. The hope was to learn how many animals could be kept in the "insurance population" (also known as the "captive breeding program", then genotype a substantially larger number of individuals, i.e., the candidates for founding the insurance population. The remaining ingredient would be to decide on the "target distribution of alleles" at those SNPs. For example, a particular genomic position might contain either C or G, and we would somehow decide that our goal was to have a C 75% of the time. That is, if there were to be 200 animals in the captive breeding program, then there would be 400 nucleotides at that position (since every individual has two copies), out of which 300 (i.e., 75%) should be C. How would we determine the "right" fraction for each position? One approach would be to always pick 50%, which is asking to maximize genetic diversity. A more sophisticated and difficult approach would be to sequence a number of genomes from museum specimens, and use that information to estimate the frequency before recent population bottlenecks. In any case, once those desired frequencies are set, the computational methods picks the specified number of individuals from the genotyped set that comes closest to fitting the desired set of allele frequencies.

To apply this approach to the Tasmanian devil, we would first need to select a set of SNPs. We could now do a better job than we did to get the 1536 SNPs discussed in the paper, since we subsequently generated much additional sequence data, created a preliminary genome assembly, called (predicted) about 1 million SNPs, and made them avaiable on the Galaxy website along with tools to help design genotyping experiments. The selected SNPs would need to be genotyped in a large number of healthy animals, and our computer program applied. (The genotyping experiments discussed in the paper were performed on anonymous individuals, many of whom are no longer alive.)

Some of our publications:

[1] Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA

[2] Whole-genome shotgun sequencing of mitochondria from ancient hair shafts

[3] Intraspecific phylogenetic analysis of Siberian woolly mammoths using complete mitochondrial genomes

[4] Sequencing the nuclear genome of the extinct woolly mammoth

[5] Analysis of complete mitochondrial genomes from extinct and extant rhinoceroses reveals lack of phylogenetic resolution

[6] The mitochondrial genome sequence of the Tasmanian tiger (Thylacinus cynocephalus)

[7] Complete mitochondrial genome of a Pleistocene jawbone unveils the origin of polar bear

[8] Calling SNPs without a reference sequence

[9] Optimization methods for selecting founder individuals for captive breeding or reintroduction of endangered species

Our websites:

The Schuster Lab

The Miller Lab

The Mammoth Genome Project

Tasmanian Tiger Sequencing Project

Workshop on Genomic Methods for Species Conservation