DropsRNA¶

Design¶

The key question for Drops Pipeline design.

Split the barcode;
Tagging reads from genome position;

Functions¶

Count barcode¶

baseq.drops.barcode.count.count_barcodes(path, output, protocol, min_reads, topreads=100)[source]¶

Count thre number of Each barcode

from baseq.drops.barcode.count import count_barcodes
count_barcodes("10X.1.fq.gz", "bc.counts.txt", "10X", min_reads=50, topreads=1000)

Return:

#A barcode_count file in csv: bc.counts.txt
cellbarcode, counts

baseq.drops.barcode.count.extract_barcode(protocol, seq)[source]¶

Extract cell barcode from reads

10X: seq[0:16]
indrop: seq[0:i] + seq[i + 22 : i + 22 + 8] (i is length of barcode 1)
dropseq: seq[0:12]

Usage:

from baseq.drops.barcode.count import extract_barcode
extract_barcode("10X", "ATCGATCGATCGACTAAATTTTTTT")

Result:: barcode: barcode sequence, if no valid barcode, return “”

Correct & stats barcode¶

baseq.drops.whitelist.read_whitelist(protocol)[source]¶: Read Whitelist Get whitelist from config file: Drops/whitelistDir

baseq.drops.whitelist.whitelist_check(bc_white, protocol, barcode)[source]¶: Check whitelist…

baseq.drops.barcode.stats.valid_barcode(protocol='', barcode_count='', max_cell=10000, min_reads=2000, output='bc.stats.txt')[source]¶

Aggregate the mismatch barcode, get the total_reads;

Read the barcode counts files;
Correct the barcode with 1bp mismatch;
Stats the mismatch barcode reads and sequences;
Determine wheather mutate on the last base (show A/T/C/G with similar ratio at the last base);
Filter by whitelist;
Filter by read counts (>=min_reads);
Print the number of barcode and reads retained after each steps.

Usage:

from baseq.drops.barcode.stats import valid_barcode
valid_barcode("10X", "bc.counts.txt", 10000,
    max_cell=10000, min_reads=2000, output="bc.stats.txt")

This write a bc_stats.csv file (CSV) which contains:

barcode/counts/mismatch_reads/mismatch_bc/mutate_last_base

Barcode Split¶

baseq.drops.barcode.split.getUMI(protocol, barcode, seq1, mutate_last_base)[source]¶

Get the UMI from the raw read…

10X: 16-26
indrop: seq1[len(barcode) + 22:len(barcode) + 22 + 6]
dropseq: 11-19/12-20

baseq.drops.barcode.split.split(name, protocol, bcstats, fq1, fq2, dir, topreads=10)[source]¶

Barcode split into 16 files according to the valid barcode in the bcstats files.

Determine whether the last base mutates;
Filter by whitelist;
read barcode stats…
build barcode correction table…
build barcode_mutate_last list…

Usage:

from baseq.drops.barcode.split import split
split("10X_1", "10X", "bc.stats.txt", "10X_1.1.fq.gz", "10X_1.2.fq.gz", "./10X_1", topreads=10)

Result:: The splitted reads will be write to XXXX/split.[AA…..GG].fa

Reads Alignment¶

baseq.drops.run_star.run_multiple(bc_dir, workdir, sample, genome, parallel)[source]¶

Run Star Alignments…. Uasge:

run_star_multiple(bc_dir, workdir, sample, genome, parallel=4)

Results:

Aligned.AA.bam

Reads Tagging¶

baseq.drops.tag_gene.tagging_reads(genome, bam, outpath)[source]¶: Tagging reads will transform the genomic position of reads to the gene name Report the genes for a …

Alternative Poly Adenelation¶

baseq.drops.apa.scaner.scan(bam, name, chr, start, end, min_depth=10)[source]¶: SCAN THE GENOME…

baseq.drops.apa.samples.APA_usage(bamfile, APA_sitefile, celltype, gene)[source]¶

Get the abundance for each cell barcode for each APA in the gene.

Parameters:	bamfile – method for the new `Request` object. APA_sitefile – URL for the new `Request` object. celltype – (optional) The celltype file genreated from cellranger

Usage:

>>> import requests
>>> req = requests.request('GET', 'http://httpbin.org/get')
<Response [200]>

Returnsass:

Generate a heatmap; Print the Read count;

baseq.drops.apa.genes.scan_genes(genome, bam, name)[source]¶

Example function with types documented in the docstring.

Args:: param1 (int): The first parameter. param2 (str): The second parameter.
Examples:: Examples should be written in doctest format, and should illustrate how to use the function.
Returns:: bool: The return value. True for success, False otherwise.

baseq.drops.apa.UTR.scan_utr(genome, bam, name)[source]¶: For a genome, read the gencode annotationm get the logest UTR for each gene (>=1000bp) Apply the ‘scan’ function for each UTR (default 20 threads…) Call the peaks for each UTR. Build and Write the APA Peaks for all the genes.

baseq.drops.apa.samples.APA_usage(bamfile, APA_sitefile, celltype, gene)[source]

Get the abundance for each cell barcode for each APA in the gene.

Parameters:	bamfile – method for the new `Request` object. APA_sitefile – URL for the new `Request` object. celltype – (optional) The celltype file genreated from cellranger

Usage:

>>> import requests
>>> req = requests.request('GET', 'http://httpbin.org/get')
<Response [200]>

Returnsass:

Generate a heatmap; Print the Read count;

DropsRNA¶

Design¶

Functions¶

Count barcode¶

Correct & stats barcode¶

Barcode Split¶

Reads Alignment¶

Reads Tagging¶

Alternative Poly Adenelation¶

Table Of Contents

Related Topics

This Page