BAM

Functions

  • Read bam file, stats the bamfile (reads, mapping ratio…);
  • Get the depth for a genomic region (and visualization);
  • Get the reads overlapped with a genomic region;
Col Field Type Brief Description
1 QNAME String Query template NAME
2 FLAG Int bitwise FLAG
3 RNAME String References sequence NAME
4 POS Int 1- based leftmost mapping POSition
5 MAPQ Int MAPping Quality
6 CIGAR String CIGAR String
7 RNEXT String Ref. name of the mate/next read
8 PNEXT Int Position of the mate/next read
9 TLEN Int observed Template LENgth
10 SEQ String segment SEQuence
11 QUAL String ASCII of Phred-scaled base QUALity+33

Design

Most of the function develop based on “samtools”. The version should be >=1.3.0

  • samtools depth: to get the coverage depth;
  • samtools view chrN:start-end : to get the overlapped reads;

Class

class baseq.bam.BAMTYPE(path, bedfile='')[source]

BAM File Handler, Based on samtools. While initiate, it read the path using samtools and will parse the headers.

Usage:
  • Stats on enrichment quality.
get_columns(rows=10000, colIdx=6)[source]

Read the bamfile using samtools, get the infors in the column<colIDx> and first <rows> of Rows. The columns of bam files are:

  1. header
  2. flags
  3. chromosome
  4. start
  5. mapping quality
  6. cigar

The colIdx start from 1.

BAMTYPE(path).get_columns(1000, 3)
# ['chr1', 'chr1', ...]
get_reads(chr, start, end)[source]

Return The Reads that overlaps with region chrN:start-end.

  • Skip reads contains “N” cigar.
read_counts(chr, star, end)[source]
Todo:
  • For module TODOs
region_bed_depth(bedfile)[source]

The depth of regions in an bed file.

region_depth(chr, start, end, all=False)[source]

Get the depth coverage of bases in the region. It will suitable for chromesome name like “chr1” and “1”.

Parameters:all – Shall the bases with zero coverge be returned.

Usage:

BAMTYPE(path).region_depth("chr1", 1000, 2000, all=True)
`return depth list [0,1,1,1,2,2,2,3,0]`
stats_bam()[source]

Read the bampath.stat, if not exists, perform the samtools flagstat The results will be:

  1. self.reads_total
  2. self.reads_mapped
  3. self.mapping_ratio
stats_bases()[source]

Stats on the mean match length for the top 100K reads in the bam file

stats_duplicates()[source]

Stats Duplication Rates from the top 1M reads; The duplication should be reflected in the flag

stats_region_coverage(numbers=1000)[source]

Check the enrichment quality.

  • Require a bedfile while initiating the class
  • Select <numbers> regions randomly
  • Use multithread pool to get the coverage depth of the regions
  • Stats on the ratio of 10X, 30X, 50X and 100X bases

Usage:

BAMTYPE("sample.bam", "panel.bed").stats_region_coverage(1000)
The results will be save in object properies:
self.mean_depth/self.pct_10X/..