CNV¶

baseq-CNV is a toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with Whole Genome Sequencing datas for both bulk and single cell experiments.

The copy number is based on the reads counts per genomic region. The region are predefined to exclude and discount the low complexity parts.

Reads Alignment using Bowtie2
Bin Counting for unique mapped reads
Normalize by GC content
CBS for reducing noise

Result¶

Name	Description
sample.bowtie.bam	Aligned bam file
sample.bin.counts.txt	The counts of reads for each bin in the dynamic_bin_file
sample.CNV_plot_[size].png	CNV plot figure for each bin-size
sample.GC.png	GC content datas

Pipeline¶

The total pipeline

baseq-CNV run_pipeline ./Tn5_S1.fq.gz -g hg19

Alignment¶

We use bwa to alignment.

baseq-CNV align -1 Tn5_S1.fq.gz -r 4000000 -g hg19 -t 10

BinCounting¶

According to dynamicbin … The command is..

baseq-CNV bincount -g hg19 -i ./sample.bam -o normbincounts.txt

Normalize¶

Normalize the raw read counts.

baseq-CNV normalize -g hg19 -i ./bincounts.txt -o bincounts_norm.txt

CBS¶

Segmentation

baseq-CNV cbs -i ./bincounts_norm.txt  -o ./out.file

http://p8v379qr8.bkt.clouddn.com/CNV_normalize.png

Plot¶

Plot genomic…

baseq-CNV plotgenome -i ./bincounts_norm.txt -c ./out.file

Config¶

[CNV]
bowtie2 = /mnt/gpfs/Database/softs/anaconda2/bin/bowtie2
samtools = /mnt/gpfs/Database/softs/anaconda2/bin/samtools

[CNV_ref_hg19]
bowtie2_index = /mnt/gpfs/Database/ref/hg19/hg19
dynamic_bin = /mnt/gpfs/Users/zhangxiannian/basematic/cnv/hg19.dynabin.txt

Quality Control¶

Alignment inforamtion and MAD

Alignment: Total reads, mapping ratio
MAD : Median Absolute Deviations, indicates the technical noise level of the sample.

Dynamic Bins¶

Dynamic Bin: can be downloaded from github

datas containing columns.

APIs¶

Align¶

baseq.align.bowtie2.bowtie2_sort(fq1, fq2, bamfile, genome, reads=5000000, thread=8)[source]¶

Align the fastq reads using bowtie2 and sort the samfile.

from baseq.align.bowtie2 import bowtie2_sort

#for single reads
bowtie2_sort("read.1.fq.gz", "")

#for multiple reads
bowtie2_sort("read.1.fq.gz", "read.2.fq.gz")

Results:

sample.bam
sample.bam.stats

Bincount¶

baseq.cnv.bincount.counting(genome, bamfile, out)[source]¶

bin counting using bisect for the dynamicbin;

from baseq.cnv.bincount.counting import counting
counting("hg19", "aligned.bam", "bincount.txt")

This will generate:

bincount.txt
# A tsv contain two columns: "index/counts"

Process:

Read the dynamic bin;
Read the bamfile using samtools view command;
Filter the reads with mapping quality >=40;
Map the genome position to binID and sum;

Normalize¶

baseq.cnv.normalize.normalize(genome, bincount, name)[source]¶

Normalize the Raw bin counts with bin length and GC contents, also estimate the Ploidy.

normalize("hg19", "bincounts.txt", "CNVsample")

This will generate two files:

Norm.Counts.CNVsample.txt
'chr', 'start', 'absstart', 'norm_by_GC', 'norm_by_GC_Ploidy'
Norm.CNVsample.png

Process:

Read the dynamicbin;
Aggregate the Bins into 500kb;
Normalize by bin length;
Normalize by GC;
Detect the Ploidy;

Output:: GC_content_image: images Normalized bin counts (1M)

Segmentation¶

baseq.cnv.segment.CBS(infile, path_out)[source]¶

Run DNACopy.R file Uasge:

CBS("bincounts_norm.txt", "outfile.txt")

Results:

al;sdfasdfj
asdjflkajsdfklajsdf
asdlfjalskdfjlaskdjf

Visualize¶

whole genome¶

baseq.cnv.plots.genome.plot_genome(bincount, cbs_path, name)[source]¶

Usage:

plot_genome("sample.norm.txt", "segment.txt", "sample")
#CNV.genome.sample.png

http://p8v379qr8.bkt.clouddn.com/Genome12.png

baseq.cnv.plots.genome.plot_genome_multiple(bincount, cbs_path, path_out)[source]¶

Plot multiple Genomes in the same figure.

plot_genome_multiple("sample.norm.txt", "segment.txt", "sample")

http://p8v379qr8.bkt.clouddn.com/Genome_20.png

region¶

baseq.cnv.plots.region.plot_region(bincount, cbs_path, path_out)[source]¶

Plot the region of genome…

ToDo: …….

CNV¶

Result¶

Pipeline¶

Alignment¶

BinCounting¶

Normalize¶

CBS¶

Plot¶

Config¶

Quality Control¶

Dynamic Bins¶

APIs¶

Align¶

Bincount¶

Normalize¶

Segmentation¶

Visualize¶

whole genome¶

region¶

Table Of Contents

Related Topics

This Page