ORCID Profile
0000-0003-3987-8884
Current Organisation
CSIRO
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
Publisher: Springer Berlin Heidelberg
Date: 2011
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 22-03-2022
Publisher: IEEE
Date: 04-2017
DOI: 10.1109/DCC.2017.39
Publisher: IEEE
Date: 03-2016
DOI: 10.1109/DCC.2016.79
Publisher: Oxford University Press (OUP)
Date: 02-05-2014
DOI: 10.1093/BIOINFORMATICS/BTU183
Abstract: Motivation: Next-generation sequencing technologies are revolutionizing medicine. Data from sequencing technologies are typically represented as a string of bases, an associated sequence of per-base quality scores and other metadata, and in aggregate can require a large amount of space. The quality scores show how accurate the bases are with respect to the sequencing process, that is, how confident the sequencer is of having called them correctly, and are the largest component in datasets in which they are retained. Previous research has examined how to store sequences of bases effectively here we add to that knowledge by examining methods for compressing quality scores. The quality values originate in a continuous domain, and so if a fidelity criterion is introduced, it is possible to introduce flexibility in the way these values are represented, allowing lossy compression over the quality score data. Results: We present existing compression options for quality score data, and then introduce two new lossy techniques. Experiments measuring the trade-off between compression ratio and information loss are reported, including quantifying the effect of lossy representations on a downstream application that carries out single nucleotide polymorphism and insert/deletion detection. The new methods are demonstrably superior to other techniques when assessed against the spectrum of possible trade-offs between storage required and fidelity of representation. Availability and implementation: An implementation of the methods described here is available at canovas/libCSAM . Contact: rcanovas@student.unimelb.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.
Publisher: Springer International Publishing
Date: 2017
Publisher: MDPI AG
Date: 21-05-2013
DOI: 10.3390/A6020319
Publisher: Association for Computing Machinery (ACM)
Date: 06-2012
Abstract: The use of dictionaries is a common practice among those applications performing on huge RDF datasets. It allows long terms occurring in the RDF triples to be replaced by short IDs which reference them. This decision greatly compacts the dataset and mitigates the scalability issues underlying to its management. However, the dictionary size is not negligible and the techniques used for its representation also suffer from scalability limitations. This paper focuses on this scenario by adapting compression techniques for string dictionaries to the case of RDF. We propose a novel technique: D comp , which can be tuned to represent the dictionary in compressed space (22--64%) and to perform basic lookup operations in a few microseconds (1--50μ s ). In addition, we propose D comp as a basis for specific SPARQL query optimizations leveraging its ability for early FILTER resolution.
Publisher: BMJ
Date: 04-09-2020
DOI: 10.1136/ANNRHEUMDIS-2020-217421
Abstract: Juvenile idiopathic arthritis (JIA) is an autoimmune disease and a common cause of chronic disability in children. Diagnosis of JIA is based purely on clinical symptoms, which can be variable, leading to diagnosis and treatment delays. Despite JIA having substantial heritability, the construction of genomic risk scores (GRSs) to aid or expedite diagnosis has not been assessed. Here, we generate GRSs for JIA and its subtypes and evaluate their performance. We examined three case/control cohorts (UK, US-based and Australia) with genome-wide single nucleotide polymorphism (SNP) genotypes. We trained GRSs for JIA and its subtypes using lasso-penalised linear models in cross-validation on the UK cohort, and externally tested it in the other cohorts. The JIA GRS alone achieved cross-validated area under the receiver operating characteristic curve (AUC)=0.670 in the UK cohort and externally-validated AUCs of 0.657 and 0.671 in the US-based and Australian cohorts, respectively. In logistic regression of case/control status, the corresponding odds ratios (ORs) per standard deviation (SD) of GRS were 1.831 (1.685 to 1.991) and 2.008 (1.731 to 2.345), and were unattenuated by adjustment for sex or the top 10 genetic principal components. Extending our analysis to JIA subtypes revealed that the enthesitis-related JIA had both the longest time-to-referral and the subtype GRS with the strongest predictive capacity overall across data sets: AUCs 0.82 in UK 0.84 in Australian and 0.70 in US-based. The particularly common oligoarthritis JIA also had a GRS that outperformed those for JIA overall, with AUCs of 0.72, 0.74 and 0.77, respectively. A GRS for JIA has potential to augment clinical JIA diagnosis protocols, prioritising higher-risk in iduals for follow-up and treatment. Consistent with JIA heterogeneity, subtype-specific GRSs showed particularly high performance for enthesitis-related and oligoarthritis JIA.
Publisher: Oxford University Press (OUP)
Date: 27-09-2022
Abstract: Epidemiological studies report the beneficial effects of habitual coffee consumption on incident arrhythmia, cardiovascular disease (CVD), and mortality. However, the impact of different coffee preparations on cardiovascular outcomes and survival is largely unknown. The aim of this study was to evaluate associations between coffee subtypes on incident outcomes, utilizing the UK Biobank. Coffee subtypes were defined as decaffeinated, ground, and instant, then ided into 0, & , 1, 2–3, 4–5, and & cups/day, and compared with non-drinkers. Cardiovascular disease included coronary heart disease, cardiac failure, and ischaemic stroke. Cox regression modelling with hazard ratios (HRs) assessed associations with incident arrhythmia, CVD, and mortality. Outcomes were determined through ICD codes and death records. A total of 449 563 participants (median 58 years, 55.3% females) were followed over 12.5 ± 0.7 years. Ground and instant coffee consumption was associated with a significant reduction in arrhythmia at 1–5 cups/day but not for decaffeinated coffee. The lowest risk was 4–5 cups/day for ground coffee [HR 0.83, confidence interval (CI) 0.76–0.91, P & 0.0001] and 2–3 cups/day for instant coffee (HR 0.88, CI 0.85–0.92, P & 0.0001). All coffee subtypes were associated with a reduction in incident CVD (the lowest risk was 2–3 cups/day for decaffeinated, P = 0.0093 ground, P & 0.0001 and instant coffee, P & 0.0001) vs. non-drinkers. All-cause mortality was significantly reduced for all coffee subtypes, with the greatest risk reduction seen with 2–3 cups/day for decaffeinated (HR 0.86, CI 0.81–0.91, P & 0.0001) ground (HR 0.73, CI 0.69–0.78, P & 0.0001) and instant coffee (HR 0.89, CI 0.86–0.93, P & 0.0001). Decaffeinated, ground, and instant coffee, particularly at 2–3 cups/day, were associated with significant reductions in incident CVD and mortality. Ground and instant but not decaffeinated coffee was associated with reduced arrhythmia.
Publisher: Elsevier BV
Date: 03-2016
Publisher: Springer Science and Business Media LLC
Date: 25-01-2023
DOI: 10.1186/S12936-022-04430-0
Abstract: Protozoan parasites are known to attach specific and erse group of proteins to their plasma membrane via a GPI anchor. In malaria parasites, GPI-anchored proteins (GPI-APs) have been shown to play an important role in host–pathogen interactions and a key function in host cell invasion and immune evasion. Because of their immunogenic properties, some of these proteins have been considered as malaria vaccine candidates. However, identification of all possible GPI-APs encoded by these parasites remains challenging due to their sequence ersity and limitations of the tools used for their characterization. The FT-GPI software was developed to detect GPI-APs based on the presence of a hydrophobic helix at both ends of the premature peptide. FT-GPI was implemented in C ++and applied to study the GPI-proteome of 46 isolates of the order Haemosporida. Using the GPI proteome of Plasmodium falciparum strain 3D7 and Plasmodium vivax strain Sal-1, a heuristic method was defined to select the most sensitive and specific FT-GPI software parameters. FT-GPI enabled revision of the GPI-proteome of P. falciparum and P. vivax, including the identification of novel GPI-APs. Orthology- and synteny-based analyses showed that 19 of the 37 GPI-APs found in the order Haemosporida are conserved among Plasmodium species. Our analyses suggest that gene duplication and deletion events may have contributed significantly to the evolution of the GPI proteome, and its composition correlates with speciation. FT-GPI-based prediction is a useful tool for mining GPI-APs and gaining further insights into their evolution and sequence ersity. This resource may also help identify new protein candidates for the development of vaccines for malaria and other parasitic diseases.
Publisher: ACM
Date: 26-03-2012
Publisher: Society for Industrial and Applied Mathematics
Date: 16-01-2013
Publisher: Association for Computing Machinery (ACM)
Date: 30-10-2021
DOI: 10.1145/3481638
Abstract: The Lempel-Ziv 78 ( LZ78 ) and Lempel-Ziv-Welch ( LZW ) text factorizations are popular, not only for bare compression but also for building compressed data structures on top of them. Their regular factor structure makes them computable within space bounded by the compressed output size. In this article, we carry out the first thorough study of low-memory LZ78 and LZW text factorization algorithms, introducing more efficient alternatives to the classical methods, as well as new techniques that can run within less memory space than the necessary to hold the compressed file. Our results build on hash-based representations of tries that may have independent interest.
Publisher: Oxford University Press (OUP)
Date: 18-08-2016
DOI: 10.1093/BIOINFORMATICS/BTW543
Abstract: Motivation: Next generation sequencing machines produce vast amounts of genomic data. For the data to be useful, it is essential that it can be stored and manipulated efficiently. This work responds to the combined challenge of compressing genomic data, while providing fast access to regions of interest, without necessitating decompression of whole files. Results: We describe CSAM (Compressed SAM format), a compression approach offering lossless and lossy compression for SAM files. The structures and techniques proposed are suitable for representing SAM files, as well as supporting fast access to the compressed information. They generate more compact lossless representations than BAM, which is currently the preferred lossless compressed SAM-equivalent format and are self-contained, that is, they do not depend on any external resources to compress or decompress SAM files. Availability and Implementation: An implementation is available at canovas/libCSAM. Contact: canovas-ba@lirmm.fr Supplementary Information: Supplementary data is available at Bioinformatics online.
Publisher: Springer Berlin Heidelberg
Date: 2010
No related grants have been discovered for Rodrigo Canovas.