ORCID Profile
0000-0002-9034-9905
Current Organisations
UNSW Sydney
,
Garvan Institute of Medical Research
,
University of New South Wales
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
In Research Link Australia (RLA), "Research Topics" refer to ANZSRC FOR and SEO codes. These topics are either sourced from ANZSRC FOR and SEO codes listed in researchers' related grants or generated by a large language model (LLM) based on their publications.
Applications in life sciences | Bioinformatics and computational biology | Electronics sensors and digital hardware | Energy-efficient computing | Bioinformatic methods development | Digital processor architectures
Publisher: IEEE
Date: 12-2015
Publisher: IEEE
Date: 12-2016
Publisher: Sri Lanka Journals Online (JOL)
Date: 27-12-2017
Publisher: IEEE
Date: 12-2015
Publisher: Cold Spring Harbor Laboratory
Date: 25-05-2023
DOI: 10.1101/2023.05.25.542242
Abstract: DNA methylation (5-methylcytosine, 5mC) is a repressive gene regulatory mark widespread in vertebrate genomes, yet the developmental dynamics in which 5mC patterns are established vary across species. While mammals undergo two rounds of global 5mC erasure, the zebrafish genome exhibits localized maternal-to-paternal 5mC remodeling, in which the sperm epigenome is inherited in the early embryo. To date, it is unclear how evolutionarily conserved such 5mC remodeling strategies are, and what their biological function is. Here, we studied 5mC dynamics during the embryonic development of sea l rey ( Petromyzon marinus ), a jawless vertebrate which occupies a critical phylogenetic position as the sister group of the jawed vertebrates. We employed base-resolution 5mC quantification in the l rey germline, embryonic and somatic tissues, and discovered large-scale maternal-to-paternal epigenome remodeling that affects % of the embryonic genome and is predominantly associated with partially methylated domains (PMDs). We further demonstrate that sequences eliminated during programmed genome rearrangement (PGR), a hallmark of l rey embryogenesis, are hypermethylated in sperm prior to the onset of PGR. Our study thus unveils important insights into the evolutionary origins of vertebrate 5mC reprogramming, and how this process might participate in erse developmental strategies.
Publisher: Springer Science and Business Media LLC
Date: 13-03-2019
DOI: 10.1038/S41598-019-40739-8
Abstract: The advent of Nanopore sequencing has realised portable genomic research and applications. However, state of the art long read aligners and large reference genomes are not compatible with most mobile computing devices due to their high memory requirements. We show how memory requirements can be reduced through parameter optimisation and reference genome partitioning, but highlight the associated limitations and caveats of these approaches. We then demonstrate how these issues can be overcome through an appropriate merging technique. We incorporated multi-index merging into the Minimap2 aligner and demonstrate that long read alignment to the human genome can be performed on a system with 2 GB RAM with negligible impact on accuracy.
Publisher: IEEE
Date: 12-2015
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2018
Publisher: Association for Computing Machinery (ACM)
Date: 08-10-2019
DOI: 10.1145/3358211
Abstract: Treatment of patients using high-quality precision medicine requires a thorough understanding of the genetic composition of a patient. Ideally, the identification of unique variations in an in idual’s genome is needed for specifying the necessary treatment. Variant calling workflow is a pipeline of tools, integrating state of the art software systems aimed at alignment, sorting and variant calling for the whole genome sequencing (WGS) data. This pipeline is utilized for identifying unique variations in an in idual’s genome (compared to a reference genome). Currently, such a workflow is implemented on high-performance computers (with additional GPUs or FPGAs) or in cloud computers. Such systems are large, have a high cost, and rely on the internet for genome data transfer which makes the system unusable in remote locations unequipped with internet connectivity. It further raises privacy concerns due to processing being carried out in a different facility. To overcome such limitations, in this paper, for the first time, we present a cost-efficient, offline, scalable, portable, and energy-efficient computing system named SWARAM for variant calling workflow processing. The system uses novel architecture and algorithms to match against partial reference genomes to exploit smaller memory sizes which are typically available in tiny processing systems. Extensive tests on a standard benchmark data-set (NA12878 Illumina platinum genome) confirm that the time consumed for the data transfer and completing variant calling workflow on SWARAM was competitive to that of a 32-core Intel Xeon server with similar accuracy, but costs less than a fifth, and consumes less than 40% of the energy of the server system. The original scripts and code we developed for executing the variant calling workflow on SWARAM are available in the associated Github repository github.com/Rammohanty/swaram.
Publisher: IEEE
Date: 09-2016
Publisher: Oxford University Press (OUP)
Date: 30-05-2023
DOI: 10.1093/BIOINFORMATICS/BTAD352
Abstract: Nanopore sequencing is emerging as a key pillar in the genomic technology landscape but computational constraints limiting its scalability remain to be overcome. The translation of raw current signal data into DNA or RNA sequence reads, known as ‘basecalling’, is a major friction in any nanopore sequencing workflow. Here, we exploit the advantages of the recently developed signal data format ‘SLOW5’ to streamline and accelerate nanopore basecalling on high-performance computing (HPC) and cloud environments. SLOW5 permits highly efficient sequential data access, eliminating a potential analysis bottleneck. To take advantage of this, we introduce Buttery-eel, an open-source wrapper for Oxford Nanopore’s Guppy basecaller that enables SLOW5 data access, resulting in performance improvements that are essential for scalable, affordable basecalling. Buttery-eel is available at github.com/Psy-Fer/buttery-eel.
Publisher: Cold Spring Harbor Laboratory
Date: 02-06-2023
DOI: 10.1101/2023.05.30.542681
Abstract: minimap2 is the gold-standard software for reference-based sequence mapping in third-generation long-read sequencing. While minimap2 is relatively fast, further speedup is desirable, especially when processing a multitude of large datasets. In this work, we present minimap2-fpga , a hardware-accelerated version of minimap2 that speeds up the mapping process by integrating an FPGA kernel optimised for chaining. We demonstrate speed-ups in end-to-end run-time for data from both Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio). minimap2-fpga is up to 79% and 53% faster than minimap2 for ∼ 30× ONT and ∼ 50× PacBio datasets respectively, when mapping without base-level alignment. When mapping with base-level alignment, minimap2-fpga is up to 62% and 10% faster than minimap2 for ∼ 30× ONT and ∼ 50× PacBio datasets respectively. The accuracy is near-identical to that of original minimap2 for both ONT and PacBio data, when mapping both with and without base-level alignment. minimap2-fpga is supported on Intel FPGA-based systems (evaluations performed on an on-premise system) and Xilinx FPGA-based systems (evaluations performed on a cloud system). We also provide a well-documented library for the FPGA-accelerated chaining kernel to be used by future researchers developing sequence alignment software with limited hardware background.
Publisher: IEEE
Date: 12-2014
Publisher: Cold Spring Harbor Laboratory
Date: 22-04-2021
DOI: 10.1101/2021.04.21.440861
Abstract: InterARTIC is an interactive web application for the analysis of viral whole-genome sequencing (WGS) data generated on Oxford Nanopore Technologies (ONT) devices. A graphical interface enables users with no bioinformatics expertise to analyse WGS experiments and reconstruct consensus genome sequences from in idual isolates of viruses, such as SARS-CoV-2. InterARTIC is intended to facilitate widespread adoption and standardisation of ONT sequencing for viral surveillance and molecular epidemiology. We demonstrate the use of InterARTIC for the analysis of ONT viral WGS data from SARS-CoV-2 and Ebola virus, using a laptop computer or the internal computer on an ONT GridION sequencing device. We showcase the intuitive graphical interface, workflow customisation capabilities and job-scheduling system that facilitate execution of small- and large-scale WGS projects on any common virus. InterARTIC is a free, open-source web application implemented in Python. The application can be downloaded as a set of pre-compiled binaries that are compatible with all common Ubuntu distributions, or built from source. For further details please visit: github.com/Psy-Fer/interARTIC/ .
Publisher: Cold Spring Harbor Laboratory
Date: 07-02-2023
DOI: 10.1101/2023.02.06.527365
Abstract: Nanopore sequencing is emerging as a key pillar in the genomic technology landscape but computational constraints limiting its scalability remain to be overcome. The translation of raw current signal data into DNA or RNA sequence reads, known as ‘basecalling’, is a major friction in any nanopore sequencing workflow. Here, we exploit the advantages of the recently developed signal data format ‘SLOW5’ to streamline and accelerate nanopore basecalling on high-performance computer (HPC) and cloud environments. SLOW5 permits highly efficient sequential data access, eliminating a significant analysis bottleneck. To take advantage of this, we introduce Buttery-eel , an open-source wrapper for Oxford Nanopore’s Guppy basecaller that enables SLOW5 data access, resulting in performance improvements that are essential for scalable, affordable basecalling.
Publisher: Springer Science and Business Media LLC
Date: 09-12-2020
DOI: 10.1038/S41467-020-20075-6
Abstract: Viral whole-genome sequencing (WGS) provides critical insight into the transmission and evolution of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). Long-read sequencing devices from Oxford Nanopore Technologies (ONT) promise significant improvements in turnaround time, portability and cost, compared to established short-read sequencing platforms for viral WGS (e.g., Illumina). However, adoption of ONT sequencing for SARS-CoV-2 surveillance has been limited due to common concerns around sequencing accuracy. To address this, here we perform viral WGS with ONT and Illumina platforms on 157 matched SARS-CoV-2-positive patient specimens and synthetic RNA controls, enabling rigorous evaluation of analytical performance. We report that, despite the elevated error rates observed in ONT sequencing reads, highly accurate consensus-level sequence determination was achieved, with single nucleotide variants (SNVs) detected at % sensitivity and % precision above a minimum ~60-fold coverage depth, thereby ensuring suitability for SARS-CoV-2 genome analysis. ONT sequencing also identified a surprising ersity of structural variation within SARS-CoV-2 specimens that were supported by evidence from short-read sequencing on matched s les. However, ONT sequencing failed to accurately detect short indels and variants at low read-count frequencies. This systematic evaluation of analytical performance for SARS-CoV-2 WGS will facilitate widespread adoption of ONT sequencing within local, national and international COVID-19 public health initiatives.
Publisher: Cold Spring Harbor Laboratory
Date: 20-06-2022
DOI: 10.1101/2022.06.19.496732
Abstract: Nanopore sequencing is an emerging technology that is being rapidly adopted in research and clinical genomics. We recently developed SLOW5, a new file format for storage and analysis of raw data from nanopore sequencing experiments. SLOW5 is a community-centric, open source format that offers considerable performance benefits over the existing nanopore data format, known as FAST5. Here we introduce slow5tools , a simple, intuitive toolkit for handling nanopore raw signal data in SLOW5 format. Slow5tools enables lossless FAST5-to-SLOW5 and SLOW5-to-FAST5 data conversion, and a range of tools for structuring, indexing, viewing and querying SLOW5 files. Slow5tools uses multi-threading, multi-processing and other engineering strategies to achieve fast data conversion and manipulation, including live FAST5-to-SLOW5 conversion during sequencing. We outline a series of ex les and benchmarking experiments to illustrate slow5tools usage, and describe the engineering principles underpinning its high performance. Slow5tools is an essential toolkit for handling nanopore signal data, which was developed to support adoption of SLOW5 by the nanopore community. Slow5tools is written in C/C++ with minimal dependencies and is freely available as an open-source program under an MIT licence: asindu2008/slow5tools .
Publisher: Springer Science and Business Media LLC
Date: 05-08-2020
DOI: 10.1186/S12859-020-03697-X
Abstract: Nanopore sequencing enables portable, real-time sequencing applications, including point-of-care diagnostics and in-the-field genotyping. Achieving these outcomes requires efficient bioinformatic algorithms for the analysis of raw nanopore signal data. However, comparing raw nanopore signals to a biological reference sequence is a computationally complex task. The dynamic programming algorithm called Adaptive Banded Event Alignment (ABEA) is a crucial step in polishing sequencing data and identifying non-standard nucleotides, such as measuring DNA methylation. Here, we parallelise and optimise an implementation of the ABEA algorithm (termed f5c ) to efficiently run on heterogeneous CPU-GPU architectures. By optimising memory, computations and load balancing between CPU and GPU, we demonstrate how f5c can perform ∼3-5 × faster than an optimised version of the original CPU-only implementation of ABEA in the Nanopolish software package. We also show that f5c enables DNA methylation detection on-the-fly using an embedded System on Chip (SoC) equipped with GPUs. Our work not only demonstrates that complex genomics analyses can be performed on lightweight computing systems, but also benefits High-Performance Computing (HPC). The associated source code for f5c along with GPU optimised ABEA is available at asindu2008/f5c .
Publisher: Cold Spring Harbor Laboratory
Date: 30-06-2021
DOI: 10.1101/2021.06.29.450255
Abstract: Nanopore sequencing is an emerging genomic technology with great potential. However, the storage and analysis of nanopore sequencing data have become major bottlenecks preventing more widespread adoption in research and clinical genomics. Here, we elucidate an inherent limitation in the file format used to store raw nanopore data – known as FAST5 – that prevents efficient analysis on high-performance computing (HPC) systems. To overcome this, we have developed SLOW5, an alternative file format that permits efficient parallelisation and, thereby, acceleration of nanopore data analysis. For ex le, we show that using SLOW5 format, instead of FAST5, reduces the time and cost of genome-wide DNA methylation profiling by an order of magnitude on common HPC systems, and delivers consistent improvements on a wide range of different architectures. With a simple, accessible file structure and a ~ 25% reduction in size compared to FAST5, SLOW5 format will deliver substantial benefits to all areas of the nanopore community.
Publisher: Cold Spring Harbor Laboratory
Date: 10-2021
DOI: 10.1101/2021.09.27.21263187
Abstract: Short-tandem repeat (STR) expansions are an important class of pathogenic genetic variants. Over forty neurological and neuromuscular diseases are caused by STR expansions, with 37 different genes implicated to date. Here we describe the use of programmable targeted long-read sequencing with Oxford Nanopore’s ReadUntil function for parallel genotyping of all known neuropathogenic STRs in a single, simple assay. Our approach enables accurate, haplotype-resolved assembly and DNA methylation profiling of expanded and non-expanded STR sites. In doing so, the assay correctly diagnoses all in iduals in a cohort of patients ( n = 27) with various neurogenetic diseases, including Huntington’s disease, fragile X syndrome and cerebellar ataxia (CANVAS) and others. Targeted long-read sequencing solves large and complex STR expansions that confound established molecular tests and short-read sequencing, and identifies non-canonical STR motif conformations and internal sequence interruptions. Even in our relatively small cohort, we observe a wide ersity of STR alleles of known and unknown pathogenicity, suggesting that long-read sequencing will redefine the genetic landscape of STR expansion disorders. Finally, we show how the flexible inclusion of pharmacogenomics (PGx) genes as secondary ReadUntil targets can identify clinically actionable PGx genotypes to further inform patient care, at no extra cost. Our study addresses the need for improved techniques for genetic diagnosis of STR expansion disorders and illustrates the broad utility of programmable long-read sequencing for clinical genomics. This study describes the development and validation of a programmable targeted nanopore sequencing assay for parallel genetic diagnosis of all known pathogenic short-tandem repeats (STRs) in a single, simple test.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 09-2023
Publisher: American Association for the Advancement of Science (AAAS)
Date: 04-03-2022
Abstract: More than 50 neurological and neuromuscular diseases are caused by short tandem repeat (STR) expansions, with 37 different genes implicated to date. We describe the use of programmable targeted long-read sequencing with Oxford Nanopore’s ReadUntil function for parallel genotyping of all known neuropathogenic STRs in a single assay. Our approach enables accurate, haplotype-resolved assembly and DNA methylation profiling of STR sites, from a list of predetermined candidates. This correctly diagnoses all in iduals in a small cohort ( n = 37) including patients with various neurogenetic diseases ( n = 25). Targeted long-read sequencing solves large and complex STR expansions that confound established molecular tests and short-read sequencing and identifies noncanonical STR motif conformations and internal sequence interruptions. We observe a ersity of STR alleles of known and unknown pathogenicity, suggesting that long-read sequencing will redefine the genetic landscape of repeat disorders. Last, we show how the inclusion of pharmacogenomic genes as secondary ReadUntil targets can further inform patient care.
Publisher: ACM
Date: 02-11-2020
Publisher: IEEE
Date: 2017
Publisher: Research Square Platform LLC
Date: 13-07-2021
DOI: 10.21203/RS.3.RS-668517/V1
Abstract: Nanopore sequencing is an emerging genomic technology with great potential. However, the storage and analysis of nanopore sequencing data have become major bottlenecks preventing more widespread adoption in research and clinical genomics. Here, we elucidate an inherent limitation in the file format used to store raw nanopore data – known as FAST5 – that prevents efficient analysis on high-performance computing (HPC) systems. To overcome this we have developed SLOW5, an alternative file format that permits efficient parallelisation and, thereby, acceleration of nanopore data analysis. For ex le, we show that using SLOW5 format, instead of FAST5, reduces the time and cost of genome-wide DNA methylation profiling by an order of magnitude on common HPC systems, and delivers consistent improvements on a wide range of different architectures. With a simple, accessible file structure and a ~25% reduction in size compared to FAST5, SLOW5 format will deliver substantial benefits to all areas of the nanopore community.
Publisher: Oxford University Press (OUP)
Date: 28-12-2022
DOI: 10.1093/GIGASCIENCE/GIAD046
Abstract: Third-generation nanopore sequencers offer selective sequencing or “Read Until” that allows genomic reads to be analyzed in real time and abandoned halfway if not belonging to a genomic region of “interest.” This selective sequencing opens the door to important applications such as rapid and low-cost genetic tests. The latency in analyzing should be as low as possible for selective sequencing to be effective so that unnecessary reads can be rejected as early as possible. However, existing methods that employ a subsequence dynamic time warping (sDTW) algorithm for this problem are too computationally intensive that a massive workstation with dozens of CPU cores still struggles to keep up with the data rate of a mobile phone–sized MinION sequencer. In this article, we present Hardware Accelerated Read Until (HARU), a resource-efficient hardware–software codesign-based method that exploits a low-cost and portable heterogeneous multiprocessor system-on-chip platform with on-chip field-programmable gate arrays (FPGA) to accelerate the sDTW-based Read Until algorithm. Experimental results show that HARU on a Xilinx FPGA embedded with a 4-core ARM processor is around 2.5× faster than a highly optimized multithreaded software version (around 85× faster than the existing unoptimized multithreaded software) running on a sophisticated server with a 36-core Intel Xeon processor for a SARS-CoV-2 dataset. The energy consumption of HARU is 2 orders of magnitudes lower than the same application executing on the 36-core server. HARU demonstrates that nanopore selective sequencing is possible on resource-constrained devices through rigorous hardware–software optimizations. The source code for the HARU sDTW module is available as open source at eebdev/HARU, and an ex le application that uses HARU is at eebdev/sigfish-haru.
Publisher: Springer Science and Business Media LLC
Date: 03-01-2022
DOI: 10.1038/S41587-021-01147-4
Abstract: Nanopore sequencing depends on the FAST5 file format, which does not allow efficient parallel analysis. Here we introduce SLOW5, an alternative format engineered for efficient parallelization and acceleration of nanopore data analysis. Using the ex le of DNA methylation profiling of a human genome, analysis runtime is reduced from more than two weeks to approximately 10.5 h on a typical high-performance computer. SLOW5 is approximately 25% smaller than FAST5 and delivers consistent improvements on different computer architectures.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 11-2023
Publisher: Springer Science and Business Media LLC
Date: 29-09-2020
DOI: 10.1038/S42003-020-01270-Z
Abstract: The advent of portable nanopore sequencing devices has enabled DNA and RNA sequencing to be performed in the field or the clinic. However, advances in in situ genomics require parallel development of portable, offline solutions for the computational analysis of sequencing data. Here we introduce Genopo , a mobile toolkit for nanopore sequencing analysis. Genopo compacts popular bioinformatics tools to an Android application, enabling fully portable computation. To demonstrate its utility for in situ genome analysis, we use Genopo to determine the complete genome sequence of the human coronavirus SARS-CoV-2 in nine patient isolates sequenced on a nanopore device, with Genopo executing this workflow in less than 30 min per s le on a range of popular smartphones. We further show how Genopo can be used to profile DNA methylation in a human genome s le, illustrating a flexible, efficient architecture that is suitable to run many popular bioinformatics tools and accommodate small or large genomes. As the first ever smartphone application for nanopore sequencing analysis, Genopo enables the genomics community to harness this cheap, ubiquitous computational resource.
Publisher: Cold Spring Harbor Laboratory
Date: 24-03-2020
DOI: 10.1101/2020.03.22.002030
Abstract: F5N is the first ever Android application for nanopore sequence analysis on a mobile phone, comprised of popular tools for read alignment ( Minimap2 ), sequence data manipulation ( Samtools ) and methylation calling ( F5C/Nanopolish ). On NA12878 nanopore data, F5N can perform a complete methylation calling pipeline on a mobile phone in ∼15 minutes for a batch of 4000 nanopore reads (∼34 megabases). F5N is not only a toolkit but also a framework for integrating existing C/C++ based command line tools to run on Android. F5N will enable performing nanopore sequence analysis on-site when used with an ultra-portable nanopore sequencer (eg: MinION or the anticipated smidgION), consequently reducing the cost for special computers and high-speed Internet. F5N Android application is available on Google Play store at tore/apps/details?id=com.mobilegenomics.genopo& hl=en and the source code is available on Github at github.com/SanojPunchihewa/f5n . hirunas@eng.pdn.ac.lk
Publisher: Cold Spring Harbor Laboratory
Date: 05-09-2020
DOI: 10.1101/756122
Abstract: Nanopore sequencing has the potential to revolutionise genomics by realising portable, real-time sequencing applications, including point-of-care diagnostics and in-the-field genotyping. Achieving these applications requires efficient bioinformatic algorithms for the analysis of raw nanopore signal data. For instance, comparing raw nanopore signals to a biological reference sequence is a computationally complex task despite leveraging a dynamic programming algorithm for Adaptive Banded Event Alignment (ABEA)—a commonly used approach to polish sequencing data and identify non-standard nucleotides, such as measuring DNA methylation. Here, we parallelise and optimise an implementation of the ABEA algorithm (termed f5c ) to efficiently run on heterogeneous CPU-GPU architectures. By optimising memory, compute and load balancing between CPU and GPU, we demonstrate how f5c can perform ~3-5× faster than the original implementation of ABEA in the Nanopolish software package. We also show that f5c enables DNA methylation detection on-the-fly using an embedded System on Chip (SoC) equipped with GPUs. Our work not only demonstrates that complex genomics analyses can be performed on lightweight computing systems, but also benefits High-Performance Computing (HPC). The associated source code for f5c along with GPU optimised ABEA is available at asindu2008/f5c .
Publisher: IEEE
Date: 11-08-2021
Publisher: Cold Spring Harbor Laboratory
Date: 25-10-2022
DOI: 10.1101/2022.10.24.513498
Abstract: Nanopore sequencing allows selective sequencing, the ability to programmatically reject unwanted reads in a s le. Selective sequencing has many present and future applications in genomics research and the classification of species from a pool of species is an ex le. Existing methods for selective sequencing for species classification are still immature and the accuracy highly varies depending on the datasets. For the five datasets we tested, the accuracy of existing methods varied in the range of ~77%-97% (average accuracy %). Here we present DeepSelectNet, an accurate deep-learning-based method that can directly classify nanopore current signals belonging to a particular species. DeepSelectNet utilizes novel data preprocessing techniques and improved neural network architecture for regularization. For the five datasets tested, DeepSelectNet’s accuracy varied between ~91%-99% (average accuracy ~95%). At its best performance, DeepSelectNet achieved a nearly 12% accuracy increase compared to its deep learning-based predecessor SquiggleNet. Furthermore, precision and recall evaluated for DeepSelectNet on average were always % (average ~95%). In terms of execution performance, DeepSelectNet outperformed SquiggleNet by ~13% on average. Thus, DeepSelectNet is a practically viable method to improve the effectiveness of selective sequencing. Compared to base alignment and deep learning predecessors, DeepSelectNet can significantly improve the accuracy to enable real-time species classification using selective sequencing. The source code of DeepSelectNet is available at github.com/AnjanaSenanayake/DeepSelectNet .
Publisher: Cold Spring Harbor Laboratory
Date: 10-05-2023
DOI: 10.1101/2023.05.09.539953
Abstract: In silico simulation of next-generation sequencing data is a technique used widely in the genomics field. However, there is currently a lack of optimal tools for creating simulated data from ‘third-generation’ nanopore sequencing devices, which measure DNA or RNA molecules in the form of time-series current signal data. Here, we introduce Squigulator , a fast and simple tool for simulation of realistic nanopore signal data. Squigulator takes a reference genome, transcriptome or read sequences and generates corresponding raw nanopore signal data. This is compatible with basecalling software from Oxford Nanopore Technologies (ONT) and other third-party tools, thereby providing a useful substrate for testing, debugging, validation and optimisation of nanopore analysis methods. The user may generate noise-free ‘ideal’ data, realistic data with noise profiles emulating specific ONT protocols, or they may deterministically modify noise parameters and other variables to shape the data to their needs. To highlight its utility, we use Squigulator to model the degree to which different types of noise impact the accuracy of ONT basecalling and downstream variant detection, revealing new insights into the properties of ONT data. We provide Squigulator as an open-source tool for the nanopore community: asindu2008/squigulator
Start Date: 2023
End Date: 12-2025
Amount: $439,110.00
Funder: Australian Research Council
View Funded ActivityStart Date: 03-2023
End Date: 02-2026
Amount: $453,913.00
Funder: Australian Research Council
View Funded Activity