Drops:¶

Design¶

The key question for Drops Pipeline design.

Split the barcode;
Tagging reads from genome position;

Todos¶

尚待完成

APIs¶

Extract, Count barcode¶

baseq.drops.barcode.count.count_barcodes(path, output, protocol, min_reads, topreads=100)[source]¶

Count thre number of Each barcode

Parameters:	path – fastq file. output – The stats will write to … protocol – Protocol min_reads – minimum reads topreads – process max N million reads

Return:: A barcode_count file will be generated. cellbarcode/counts

baseq.drops.barcode.count.extract_barcode(protocol, seq)[source]¶

Extract cell barcode from reads

10X: seq[0:16]

indrop: seq[0:i] + seq[i + 22 : i + 22 + 8] (i is length of barcode 1)

dropseq: seq[0:12]

Parameters:	protocol – 10X/indrop/drop-seq. seq – The sequence containing cellbarcode.

Return:: barcode: barcode, if no valid barcode, return “”

Correct, stats barcode¶

baseq.drops.barcode.stats.valid_barcode(protocol='', barcode_count='', max_cell=10000, min_reads=2000, output='./bc_stats.txt')[source]¶

Aggregate the mismatch barcode, get the total_reads;

Read the barcode counts files;
Correct the barcode with 1bp mismatch;
Stats the mismatch barcode reads and sequences;
Determine wheather mutate on the last base (show A/T/C/G with similar ratio at the last base);
Filter by whitelist;
Filter by read counts (>=min_reads);
Print the number of barcode and reads retained after each steps.

Parameters:	protocol – 10X/Dropseq/inDrop. barcode_count – barcode_count. min_reads – Minimum number of reads for a cell. output – Path or name of output (./bc_stats.txt)

Return:: Write a bc_stats.csv file which contains: barcode/counts/mismatch_reads/mismatch_bc/mutate_last_base

Barcode Split¶

baseq.drops.barcode.split.split_16(name, protocol, bcstats, fq1, fq2, dir, topreads=10)[source]¶

Barcode split into 16 files according to the valid barcode in the bcstats files.

Determine whether the last base mutates;
Filter by whitelist;

Parameters:	protocol – 10X/Dropseq/inDrop. name – barcode_count. bcstats – Valid Barcode. output – (./bc_stats.txt)

Return:: The splitted reads will be write to XXXX/split.AA.fa

Reads Tagging¶

Alternative Poly Adenelation¶

baseq.drops.apa.scaner.scan(bam, name, chr, start, end, min_depth=10)[source]¶: SCAN THE GENOME…

baseq.drops.apa.samples.APA_usage(bamfile, APA_sitefile, celltype, gene)[source]¶

Get the abundance for each cell barcode for each APA in the gene.

Parameters:	bamfile – method for the new `Request` object. APA_sitefile – URL for the new `Request` object. celltype – (optional) The celltype file genreated from cellranger

Usage:

>>> import requests
>>> req = requests.request('GET', 'http://httpbin.org/get')
<Response [200]>

Returnsass:

Generate a heatmap; Print the Read cou

baseq.drops.apa.genes.scan_genes(genome, bam, name)[source]¶

Example function with types documented in the docstring.

Args:: param1 (int): The first parameter. param2 (str): The second parameter.
Examples:: Examples should be written in doctest format, and should illustrate how to use the function.
Returns:: bool: The return value. True for success, False otherwise.

baseq.drops.apa.UTR.scan_utr(genome, bam, name)[source]¶: For a genome, read the gencode annotationm get the logest UTR for each gene (>=1000bp) Apply the ‘scan’ function for each UTR (default 20 threads…) Call the peaks for each UTR. Build and Write the APA Peaks for all the genes.

baseq.drops.apa.samples.APA_usage(bamfile, APA_sitefile, celltype, gene)[source]