arabera Rozaimi Razali 1 year ago
479
Honelako gehiago
Cluster Generation
the fragments hybridize to the surface of the flow cell
DNA polymerase will bind to the hybridized fragment and create a complimentary strand
The original template/fragment is washed away
the newly created fragment then hybridize with the neighbouring oligo nucleotide bases attached to the flow cell
this amplification process will be repeated
Sequencing
The fragments are then sequenced using fluorescent tagged nucleotide
what is Fluorescent probes?
molecules that absorb light of a specific wavelength and emit light of a different wavelength
ONLY the fragments with the adapters attached are amplified
remember there are 2 types of oligo nucleotides on the flow cells
each is complementary to the starting adapters and the end adapters
What is the purpose of these adapters?
save $$$
Allow the fragments to bind to the nucleotide bases on the flowcell
~5Gb
5-10Gb
SAAPdap
SuSpect
Missense3D
LS-SNP/PDB
Annotate
SNPeffect
Variant Effect Predictor
Annotate & Rank
AnnotSV
for SVs
PVP
Random Forest Classifier
Annovar
Exomiser
Rank
DANN
CADD
POLYPHEN
SIFT
Gene Set Experiment Analysis
ClusterProfiler
Cytoscape
Enrichment Map
given a set of genes, expression data and list of phenotypes
identify statistically significant, concordant differences between two phenotypic states
other DB
dbSNP
all known SNPs as reported by GRCh, NCBI, HapMap and 1000 Genome Project
Good for seeing AF for SNPs
gnomAD
combine all publicly available WGS and WES
Good to see AF for a SNP or SV
eQTL/sQTL database
eQTL catalogue
GTEx
sQTL
variant affecting splicing
eQTL
locus affecting expression
co-localization GWAS and eQTL
COLOC
QTLtools
Disease specific DB
Cancer
COSMIC
TCGA
OMIM
Raredisease.gov
Genetics Home Reference
e.g. associated genes and known pathogenic variants
e.g. identify mode of inheritance (MOI) of the disease/phenotype
once we identified the variants, we want to find out look at the pathway that it is involves in
Paid
Pathway Studio
Ingenuity Pathway Analysis
Free/Open-source
KEGG
REACTOME
HumanCyc
BioCyc
Alignment QC
Identify PCR duplicates
Samblaster
Picard MarkDuplicates
What is PCR duplicates?
if one small error was introduced during PCR process
This error will be amplified
left with lots of reads that contains these error
Duplicates resulting from error during PCR
Tools
Minimap2
for long-reads
BWA-MEM
for short-reads
Critical phase
Raw sequence data are aligned to the reference sequence
output is a mapping file called SAM/BAM
which reads mapped where in the ref genome
e.g. of reference genome
CHM13
GRCh37/38
all downstream analyses and interpretation result from the quality of alignment
Did the sequencing work?
Post-sequencing level
Sample relatedness
KING
check how different samples are related
especially important for Trio-based analysis
verify that the proband dataset is indeed the child of the parental dataset
Sample contamination/swap
tools
Picard CrosscheckFingerprints
verifyBamID
What are the effects?
e.g. in Cancer analysis
calling contaminant germline variants as somatic
e.g. in Trio analysis
mistakenly identify variants in VCF as de novo mutations, when the variants actually came from someone else
How could this happen?
many reasons
rotated sample plate
mislabeling of sample sheets
swap
reads contain DNA from another sample
contamination
reads contain mixture of DNA from different samples
Reads quality assessment
MultiQC
FastQC
https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
All are important but the most important ones are
Per base GC content
check if there is problem with library
Per base sequence quality
check quality of your sequence
During sequencing
e.g. Sequencing Real Time Analysis
https://supportassets.illumina.com/content/dam/illumina-support/images/featured-training/sav-overview.png
Demultiplexing
all libraries should be well-balanced
Error rate
Pacbio
Sequel/RSII
less than 15%
HiFi/Sequel II
less than 1%
illumina
less than 0.5%
Percentage >Q30
in 70% of reads
Number of reads
platform specific
Miseq
>25M
> 100M
> 300M
Do I have enough sequencing reads?
Depth = (Read length x Number of reads) / Genome size
Gene Panel ~ WES > WGS
Gene Panel > WES > WGS
SNPs/short InDels
Gene Panel = WES = WGS
SV
WGS LR
WGS > WES > Gene Panel
What is it and why this is important?
Depth plays important role for heterogenous samples
e.g. cancer
In a tumor sample, normal cells tend to be observed together with tumor cells
2 populations: Normal and Tumor
We do not know the ratio of each
With high depth, modern bioinformatic tools is able to understand the differences in the reads coverage
more reads
might be duplication
fewer reads
might be deletion
During library prep, the genome is fragmented into short random fragments.
Sequencing adapters are added to these fragments
PCR amplify the libraries. ONLY the fragments with the adapters attached are amplified
These random fragments are then sequenced
The reads are then aligned to a reference genome to create longers stretch of sequences
example
purpose
Multiplexing sequencing
allow sequencing multiple samples in one run
Allow sequencer to recognize the fragments
ChIPSeq
Methylation
non-coding RNA
Targeted
mRNA
Total RNA
Targeted Gene
Whole Exome Sequencing (WES)
Whole Genome Sequencing (WGS)
Diagnosis of infectious diseases
e.g. Initial whole-genome sequencing and analysis of the host genetic contribution to COVID-19 severity and susceptibility
e.g. SARS-CoV-2
Diagnosis of specific clinical presentations of suspected genetic diseases
e.g. Neuromuscular disorder
e.g. Germline cancer risk testing
e.g. Carrier screening for recessive genetic disorders
cystic fibrosis, mendelian inherited disorders
e.g. Non-Invasive Prenatal Testing (NIPT)
common trisomy syndromes in fetuses
can be largely divided into two based on the length of the sequencing output
long-reads
Oxford Nanopore
PromethION
GridION
MinION
PacBio
Sequel II
Sequel
RS II
short-reads
Ion Torrent
Genexus System
Gene Studio S5
MGIseq
T7
G400
G50
Illumina-based platform
NovaSeq
HiSeq
NextSeq
MiSeq
MiniSeq