An Introduction to NeoGenomics’ Bioinformatics Services

NeoGenomics' bioinformatics services are designed to support our next-generation services by presenting the massive amounts of data generated by our Illumina® MiSeq®, HiSeq® 2500, Life Technologies Ion Proton, and Ion PGM platforms in an ordered and manageable format. Our bioinformatics services are highly customizable so we will work with you to generate files and reports which reveal precisely the data you are looking for.

Bioinformatics Services

Bioinformatics in Next-Generation Sequencing (NGS) Data Analysis Includes Two Major Areas

  1. NGS Data Quality Control
    NGS data quality control (QC) will be applied to every sequence generated by us to guarantee that all of our customers get qualified sequencing data. In addition to the "Summary.htm" file which indicates the NGS run qualities, NGS data QC also includes many other important aspects related to sequence quality such as basic statistics (total number of reads, GC%, sequence length etc.), per base sequence quality, per sequence quality Scores, per base GC content, per sequence GC content, per base N content, duplicate sequences and overrepresented sequences (sequence which represent more than 0.1% of the total). These parameters are very important for NGS data QC and are used by our bioinformatics team to decide whether and/or how further analysis should be conducted.

  2. NGS Data Analysis
    NeoGenomics is able to analyze almost all different types of sequence data generated by Next Generation Sequencing platforms. These sequence data include mRNA sequences, micro-RNA sequences, Captured DNA sequences and Bisulfite DNA sequence data.

The following is a list of bioinformatics deliverables and capabilities for common NGS applications:

Whole-Genome de novo Sequencing

The whole-genome de novo sequencing services include draft sequence assembly. The completeness of the assembly depends on the size, content and complexity of the genome as well as the amount of sequencing coverage generated. Metrics such as quality scores for each base, number of scaffolds, number of contigs and the N50 number will be provided. Provided metrics can be further customized based on the customer's requests. We do not currently provide annotation services for novel organisms. The following files will be provided with your de novo assembly:

  • Assembly summary with contig and scaffold metrics
  • Assembled scaffold(s)
  • All contigs
  • Depth of coverage
  • Raw sequencing reads
  • Quality values

Whole-Genome Resequencing

The whole-genome resequencing services include reference matching, SNP calling, copy number variation, Indels (small and large), quality scores, base coverage and SNP quality metrics. The following files will be provided with your resequenced genome:

  • Consensus sequence
  • Alignment report summarizing mapping results
  • SNP report including base change, coverage at SNP position, SNP type, affected gene and amino acid change (if applicable)
  • Raw sequencing reads with quality scores
  • Reads matching the genome including position & number of mismatches
  • Depth of coverage

Whole-Transcriptome Sequencing (RNA-Seq)

The whole-transcriptome sequencing services include counts for all known mRNAs, Heat maps, SNPs, novel transcribed regions with associated counts, novel splice variants and evidence of fusion genes. Additionally, our data analysis will also provide intron-exon junction sites and give overall quantification of differential expression and gene regulation. The following files will be provided with your whole-transcriptome results:

  • Raw sequencing reads and quality scores
  • Alignment report summarizing mapping results
  • Reads matching the genome including position and number of mismatches
  • Counts file containing the number of reads matching annotated exons
  • Coverage files allowing visualization in the UCSC genome browser
  • Optimization plots to aid novel transcribed region identification
  • Putative transcribed regions

Small RNA Profiling and Discovery

The small RNA profiling services include counts for all known small RNAs (under 40base pairs long) as well as counts for novel small RNAs. Micro-RNA (miRNA) discovery represents a burgeoning field within the biological research community. These short nucleotide sequences are thought to play a critical role in gene regulation. Our experienced bioinformatics specialists will use cutting-edge tools to map each read to known miRNA libraries allowing for the elucidation of miRNA-based gene regulatory changes within your samples. The following files will be provided with your de novo assembly:

  • Raw Sequencing Reads and Quality Scores
  • Alignment reports for matching to filter database, small RNA database and genome
  • Reads matching filter and small RNA databases including length and position of match
  • Reads matching the genome including length and position of match
  • Counts file including number of hits against filter and small RNA databases
  • Coverage files allowing visualization in the UCSC genome browser

Captured DNA-Seq Data

Identifying sequence variants (SNPs and InDels) is very important for clinicians to make decisions regarding the diagnosis of patients with genetic disease. NeoGenomics can use a variety of open source software such as SAMTOOLS and snpEff to identify and annotate variants in captured DNA sequences. As a certified service provider for both Agilent SureSelect Target Enrichment System and the Illumina TruSeq Arrays for Human Exome Capture as well as Custom Target Capture Arrays, NeoGenomics can provide its customers with industry-leading DNA capture technologies as well as comprehensive bioinformatics support. With all of these tools in hand, our customers can focus on using their results to identify variations related to disease-causing protein structural and functional changes. The following files will be provided with your Captured DNA-Seq project:

  • Sorted-NoDup.bam: This file contains information about sequence alignment
  • Sorted-NoDup.bam.bai: This is the index file for sorted.bam. Users can load both the sorted.bam and sorted.bam.bai files into software such as Integrative Genomic View to view the variants detected
  • Var.vcf: This file contains all the variants and their annotations. The file is in standard Variant Call Format (.vcf). Users can open this file with WordPad

Serial Analysis of Gene Expression (SAGE)

The SAGE services provide an efficient means of comprehensive expression analysis with the capability to run multiple samples on a single slide. The following files will be provided with your SAGE project:

  • Raw sequencing reads
  • Mapping summary listing each tag, its frequency, its GenBank Identifier and a brief description of the identified gene
  • Reads matching the reference including their mapping positions and mismatches
  • File comparing tags in two different samples
  • File calculating the abundances of repeat reads