LYC project
Transcriptome Assembling
integrate gene set with cufflinks
mapping reads on genome with tophat
Genome Assembling
Close gaps
GapCloser from SOAP2denovo
Build scaffolds
Do not choose the best N50 (containing more linking errors)
Use SSPACE
Based trimed reads (36 x 2) from both PE and MP libraries
Build contigs
choose the best N50
Test on a broad range of K-mers
Do not apply pairing
Use ABySS
Based on reads from PE libraries
Reads cleaning
K-mer correction
Delete PCR duplicates
Trim adaptor contamination
Based on quality score
Sequencing
transcriptome
muscle
egg
liver
genomic DNA
mp libraries
9k
5k
3k
pe libraries
600 bp
300 bp
Evolution analyses
positive selection
run GO for the PSGs, relate them with some environmental factors
evolution speed
draw Ka/Ks ratio graph between lyc and another fish
run comel on the single copy gene families
gene families
GO/Pathway analyses
based on the CAFE results
family expansion/contraction
run CAFE to get the expansion/contraction
run modeltest and mrbayes to get the overal phylogenetic tree
get single copy families
draw venn diagram
based on the ortholog groups
treefam method
ortholog groups
draw ortholog groups
run 'treebest nj' to infer orthlog relations
run 'treebest best' on the cds for each cluster
get the cds seqs for the protein multi-alignment
run muscle to get multi-alignment
get protein seqs for each cluster
call gene clusters
run hcluster_sg
wublastp
compute edge weight (g1*g2)/max(g1,g2)
use solar combine gene-to-gene blastp score
blastp the db sequences to itself
Database preparation
together with lyc proteins, build wublast db
download 6 fishes protein sequences from ensemble
Annotation
maker pipeline
add transcriptome data
use ensembl results as a start point
ensembl pipeline
integration
similarity search
based on uniprot vertebrate proteins
based on online LYC est
raw compute
others
ab inito prediction
repeat masking