ARDC Research Link Australia

ORCID Profile
Orcid icon. 0000-0003-0998-2859

Current Organisations
National Institute of Advanced Industrial Science and Technology (AIST) , University of Tokyo

Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.

Publications

Publication

The Abundance of Short Proteins in the Mammalian Proteome

Publisher: Public Library of Science (PLoS)

Date: 28-04-2006

DOI: 10.1371/JOURNAL.PGEN.0020052

Publication

Genome-wide analysis of mammalian promoter architecture and evolution

Publisher: Springer Science and Business Media LLC

Date: 28-04-2006

DOI: 10.1038/NG1789

Abstract: Mammalian promoters can be separated into two classes, conserved TATA box-enriched promoters, which initiate at a well-defined site, and more plastic, broad and evolvable CpG-rich promoters. We have sequenced tags corresponding to several hundred thousand transcription start sites (TSSs) in the mouse and human genomes, allowing precise analysis of the sequence architecture and evolution of distinct promoter classes. Different tissues and families of genes differentially use distinct types of promoters. Our tagging methods allow quantitative analysis of promoter usage in different tissues and show that differentially regulated alternative TSSs are a common feature in protein-coding genes and commonly generate alternative N termini. Among the TSSs, we identified new start sites associated with the majority of exons and with 3' UTRs. These data permit genome-scale identification of tissue-specific promoters and analysis of the cis-acting elements associated with them.

Publication

Adaptive seeds tame genomic sequence comparison

Publisher: Cold Spring Harbor Laboratory

Date: 05-01-2011

DOI: 10.1101/GR.113985.110

Abstract: The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The difficulty is caused by the nonuniform (oligo)nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e.g., BLAST) to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition.

Publication

RECLU: a pipeline to discover reproducible transcriptional start sites and their alternative regulation using capped analysis of gene expression (CAGE)

Publisher: Springer Science and Business Media LLC

Date: 25-04-2014

DOI: 10.1186/1471-2164-15-269

Publication

Rapid evolution of noncoding RNAs: lack of conservation does not mean lack of function

Publisher: Elsevier BV

Date: 2006

DOI: 10.1016/J.TIG.2005.10.003

Abstract: The mammalian transcriptome contains many non-protein-coding RNAs (ncRNAs), but most of these are of unclear significance and lack strong sequence conservation, prompting suggestions that they might be non-functional. However, certain long functional ncRNAs such as Air and Xist are also poorly conserved. In this article, we systematically analyzed the conservation of several groups of functional ncRNAs, including miRNAs, snoRNAs and longer ncRNAs whose function has been either documented or confidently predicted. As expected, miRNAs and snoRNAs were highly conserved. By contrast, the longer functional non-micro, non-sno ncRNAs were much less conserved with many displaying rapid sequence evolution. Our findings suggest that longer ncRNAs are under the influence of different evolutionary constraints and that the lack of conservation displayed by the thousands of candidate ncRNAs does not necessarily signify an absence of function.

Publication

Transcript Annotation in FANTOM3: Mouse Gene Catalog Based on Physical cDNAs

Publisher: Public Library of Science (PLoS)

Date: 28-04-2006

DOI: 10.1371/JOURNAL.PGEN.0020062

Publication

Incorporating sequence quality data into alignment improves DNA read mapping

Publisher: Oxford University Press (OUP)

Date: 27-01-2010

DOI: 10.1093/NAR/GKQ010

Publication

Discovering Sequence Motifs with Arbitrary Insertions and Deletions

Publisher: Public Library of Science (PLoS)

Date: 09-05-2008

DOI: 10.1371/JOURNAL.PCBI.1000071

Publication

A promoter-level mammalian expression atlas

Publisher: Springer Science and Business Media LLC

Date: 03-2014

DOI: 10.1038/NATURE13182

Publication

MEME SUITE: tools for motif discovery and searching

Publisher: Oxford University Press (OUP)

Date: 20-05-2009

DOI: 10.1093/NAR/GKP335

Publication

Discrimination of Non-Protein-Coding Transcripts from Protein-Coding mRNA

Publisher: Informa UK Limited

Date: 2006

DOI: 10.4161/RNA.3.1.2789

Abstract: Several recent studies indicate that mammals and other organisms produce large numbers of RNA transcripts that do not correspond to known genes. It has been suggested that these transcripts do not encode proteins, but may instead function as RNAs. However, discrimination of coding and non-coding transcripts is not straightforward, and different laboratories have used different methods, whose ability to perform this discrimination is unclear. In this study, we examine ten bioinformatic methods that assess protein-coding potential and compare their ability and congruency in the discrimination of non-coding from coding sequences, based on four underlying principles: open reading frame size, sequence similarity to known proteins or protein domains, statistical models of protein-coding sequence, and synonymous versus non-synonymous substitution rates. Despite these different approaches, the methods show broad concordance, suggesting that coding and non-coding transcripts can, in general, be reliably discriminated, and that many of the recently discovered extra-genic transcripts are indeed non-coding. Comparison of the methods indicates reasons for unreliable predictions, and approaches to increase confidence further. Conversely and surprisingly, our analyses also provide evidence that as much as approximately 10% of entries in the manually curated protein database Swiss-Prot are erroneous translations of actually non-coding transcripts.

Publication

Splicing bypasses 3′ end formation signals to allow complex gene architectures

Publisher: Elsevier BV

Date: 11-2007

DOI: 10.1016/J.GENE.2007.08.012

Abstract: Many genes are arranged in complex overlapping and interlaced patterns in eukaryotic genomes. It is unclear whether or how such genes can avoid interference from each other's RNA processing signals and retain distinct identities. This puzzle applies particularly to 3' end formation sites, which inherently terminate the transcript, and thus act as boundaries between adjacent genes. We hypothesise that the transcript processing machinery can bypass 3' end formation sites by splicing out an intron surrounding the site. We confirm a prediction of this hypothesis: the likelihood of transcripts extending beyond 3' end sites depends on the strength of 3' end formation signals located in exons in the mature transcript, but not of those in introns that are spliced out of the transcript. This bypassing mechanism permits nested and interleaved gene architectures, as well as fusion transcripts that combine exons from adjacent genes.

Publication

Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome

Publisher: Cold Spring Harbor Laboratory

Date: 12-12-2006

DOI: 10.1101/GR.4200206

Abstract: Recent large-scale analyses of mainly full-length cDNA libraries generated from a variety of mouse tissues indicated that almost half of all representative cloned sequences did not contain an apparent protein-coding sequence, and were putatively derived from non-protein-coding RNA (ncRNA) genes. However, many of these clones were singletons and the majority were unspliced, raising the possibility that they may be derived from genomic DNA or unprocessed pre-mRNA contamination during library construction, or alternatively represent nonspecific “transcriptional noise.” Here we show, using reverse transcriptase-dependent PCR, microarray, and Northern blot analyses, that many of these clones were derived from genuine transcripts of unknown function whose expression appears to be regulated. The ncRNA transcripts have larger exons and fewer introns than protein-coding transcripts. Analysis of the genomic landscape around these sequences indicates that some cDNA clones were produced not from terminal poly(A) tracts but internal priming sites within longer transcripts, only a minority of which is encompassed by known genes. A significant proportion of these transcripts exhibit tissue-specific expression patterns, as well as dynamic changes in their expression in macrophages following lipopolysaccharide stimulation. Taken together, the data provide strong support for the conclusion that ncRNAs are an important, regulated component of the mammalian transcriptome.

Publication

Pseudo–Messenger RNA: Phantoms of the Transcriptome

Publisher: Public Library of Science (PLoS)

Date: 28-04-2006

DOI: 10.1371/JOURNAL.PGEN.0020023

Publication

Clusters of Internally Primed Transcripts Reveal Novel Long Noncoding RNAs

Publisher: Public Library of Science (PLoS)

Date: 28-04-2006

DOI: 10.1371/JOURNAL.PGEN.0020037

Related Organisations

Organisation

National Institute Of Advanced Industrial Science And Technology (AIST)

Location: Japan

View Organisation

Organisation

University Of Tokyo

Location: Japan

View Organisation

Related Funding Activities

No related grants have been discovered for Martin Frith.

Martin Frith

Researcher

Related Links

Publications

The Abundance of Short Proteins in the Mammalian Proteome

Genome-wide analysis of mammalian promoter architecture and evolution

Adaptive seeds tame genomic sequence comparison

RECLU: a pipeline to discover reproducible transcriptional start sites and their alternative regulation using capped analysis of gene expression (CAGE)

Rapid evolution of noncoding RNAs: lack of conservation does not mean lack of function

Transcript Annotation in FANTOM3: Mouse Gene Catalog Based on Physical cDNAs

Incorporating sequence quality data into alignment improves DNA read mapping

Discovering Sequence Motifs with Arbitrary Insertions and Deletions

A promoter-level mammalian expression atlas

MEME SUITE: tools for motif discovery and searching

Discrimination of Non-Protein-Coding Transcripts from Protein-Coding mRNA

Splicing bypasses 3′ end formation signals to allow complex gene architectures

Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome

Pseudo–Messenger RNA: Phantoms of the Transcriptome

Clusters of Internally Primed Transcripts Reveal Novel Long Noncoding RNAs

Related Organisations

National Institute Of Advanced Industrial Science And Technology (AIST)

University Of Tokyo

Related Funding Activities

ARDC NEWSLETTER SIGNUP