Continuing from our previous blog, we will dive deeper into bioinformatics analysis offered by Novogene in this part.
3. Bioinformatic analysis of standard package
3.1. Alignment with reference genome: The process also called mapping, a step that also provides statistics about sequencing depth and coverage.

Figure 7: a,b) Sequencing depth and cumulative sequencing depth; c) Statistics of mapping, coverage and depth in each sample
- The sequencing depth distribution of all bases in each sample (figure 7a): shows the maximum coverage can be achieved at approximately 25x – 30x sequencing depth
- The accumulative sequencing depth distribution of all bases in each sample (figure 7b): 80% of bases have at least 25x-30x depth, indicating this is a sufficient depth for variant detection.
- Statistics of mapping, coverage and depth in each sample:
- Mapped Reads: Represents the percentage of sequencing reads aligned to the reference genome. This value is typically expected to be greater than 98%, indicating high-quality alignment.
- Properly Mapped Reads: Refers to the percentage of paired-end reads that are mapped with correct orientation and insert size. This value is generally expected to be greater than 95%, reflecting reliable paired-end mapping.
- Coverage at least 10×: Indicates the percentage of the genome covered by a minimum of 10 reads. This value typically should be greater than 98%.
3.2. Germline and somatic mutation detection


- SNP/Indel Detection Result (figure 9):
- Pie Chart:
- Missense SNPs and synonymous SNPs are the most prevalent types in the “cancer 2” sample.
- SNPs are predominantly located within ncRNA and intronic regions. Although ncRNA and intronic regions do not directly affect gene expression, they can interact with regulatory elements, indicating that SNPs in these regions may indirectly modulate gene activity
- Summary tables:
- Summary tables offer detailed insights into the distribution of SNPs across genomic regions.
- Whole Genome Sequencing (WGS) data reveals approximately 4 billion SNPs for each sample, with precise genomic locations, enabling comprehensive variant analysis.
- Around 120,000 novel SNPs were identified, adding new genetic variants that are not yet cataloged in known databases.
- These novel SNPs could be critical for exploring new genetic features and mechanisms in cancer that may not be fully captured by existing reference data.

Figure 10: a) Number of different types of SV in each sample, b) SV detection result table, c) The size of genomic regions affected by CNVs in each sample, d) CNV detection result table
- SV/CNV Detection Result (figure 10):
- Plot (a): Number of different types of SVs in each sample
- High SV numbers across samples: All samples (cancer and healthy controls) show a similar total number of SVs, each around 3000, with no extreme differences across the samples.
- SV type distribution: Deletions (DEL) dominate across all samples, accounting for the largest portion of SVs. This is followed by BND (translocations) and DUP (duplications).
- Low frequency of Inversions (INV) and Insertions (INS): Both INV and INS have significantly smaller proportions compared to other SV types.
- Table (b): SV Detection Result
- SV variation: There is no large variation in the types of SVs across the samples, with each sample having roughly similar counts for each SV type. However, there are minor differences:
- Cancer samples (e.g., cancer1, cancer2, cancer3) generally have slightly higher counts for all SV types compared to healthy controls (H1, H2, H3).
- Cancer samples show particularly high counts for BND (translocations) and DUP (duplications), indicating potential structural rearrangements in these samples.
- Plot (c): The size of genomic regions affected by CNVs in each sample
- Cancer sample variation: Cancer2 has a significantly larger affected genomic region, both in terms of CNV gains (gain_size) and losses (loss_size), compared to other samples. This suggests that cancer2 exhibits greater genomic instability or larger regions affected by CNVs.
- Healthy samples: Healthy samples (H1, H2, H3) show much smaller affected regions, implying fewer and smaller CNV alterations in healthy controls compared to cancer samples.
- CNV gains vs. losses: In most samples, gain_size (gain of genomic material) significantly exceeds loss_size (loss of genomic material).
- Table (d): CNV Detection Result
- Cancer samples have more CNVs: Cancer samples (cancer1, cancer2, cancer3) show higher counts of CNV gains and losses compared to healthy controls. Cancer3 has the highest total CNV count and size, particularly in terms of gain_size (almost 46 MB), indicating substantial genomic alterations.
- Cancer 2’s high CNV impact: Cancer2 stands out due to its significant gain_size (105.7 MB), suggesting this sample has more extensive genomic duplications or amplification events.
- Healthy samples with lower CNV activity: Healthy samples (H1, H2, H3) show comparatively lower CNV activity, especially for loss_size, which could indicate a more stable genomic profile in the healthy control.

Similar summary tables and plots for somatic mutation analysis are offered from Novogene as above in germline mutation analysis.
Novogene provides comprehensive whole genome sequencing with a high-quality, affordable and convenient solution with advanced WGS analysis, making it possible for a wide variety of applications. As Vietnam’s official distributor, GeneSmart is proud to offer comprehensive support to Novogene’s trusted services.
Click here to continue to Part 2 – Bioinformatic Analysis in the Standard WGS Package.
Read more about Novogene’s sequencing services offered by GeneSmart here.
Refer to “Unveiling the Human Genome using cutting-edge Sequencing techniques: WGS & WES” to start your WGS and WES journey.
Stay tuned for our next blog where we introduce you to some recognized publications that apply WGS in their research.
References
- Novogene. (2021). Unveiling the human genome using cutting-edge sequencing techniques: WGS & WES [Webinar]. https://www.novogene.com/amea-en/resources/onlineevent/unveiling-the-human-genome-using-cutting-edge-sequencing-techniques-wgs-wes/
- Novogene Co., Ltd. Whole Genome Sequencing analysis demo report
------------
GENESMART CO., LTD | Phân phối ủy quyền 10X Genomics, Altona, Biosigma, Hamilton, IT-IS (Novacyt), Norgen Biotek, Rainin tại Việt Nam.











