DropsRNA¶
Design¶
The key question for Drops Pipeline design.
- Split the barcode;
- Tagging reads from genome position;
Functions¶
Count barcode¶
-
baseq.drops.barcode.count.
count_barcodes
(path, output, protocol, min_reads, topreads=100)[source]¶ Count thre number of Each barcode
from baseq.drops.barcode.count import count_barcodes count_barcodes("10X.1.fq.gz", "bc.counts.txt", "10X", min_reads=50, topreads=1000)
Return:
#A barcode_count file in csv: bc.counts.txt cellbarcode, counts
-
baseq.drops.barcode.count.
extract_barcode
(protocol, seq)[source]¶ Extract cell barcode from reads
- 10X: seq[0:16]
- indrop: seq[0:i] + seq[i + 22 : i + 22 + 8] (i is length of barcode 1)
- dropseq: seq[0:12]
Usage:
from baseq.drops.barcode.count import extract_barcode extract_barcode("10X", "ATCGATCGATCGACTAAATTTTTTT")
- Result:
- barcode: barcode sequence, if no valid barcode, return “”
Correct & stats barcode¶
-
baseq.drops.whitelist.
read_whitelist
(protocol)[source]¶ Read Whitelist Get whitelist from config file: Drops/whitelistDir
-
baseq.drops.barcode.stats.
valid_barcode
(protocol='', barcode_count='', max_cell=10000, min_reads=2000, output='bc.stats.txt')[source]¶ Aggregate the mismatch barcode, get the total_reads;
- Read the barcode counts files;
- Correct the barcode with 1bp mismatch;
- Stats the mismatch barcode reads and sequences;
- Determine wheather mutate on the last base (show A/T/C/G with similar ratio at the last base);
- Filter by whitelist;
- Filter by read counts (>=min_reads);
- Print the number of barcode and reads retained after each steps.
Usage:
from baseq.drops.barcode.stats import valid_barcode valid_barcode("10X", "bc.counts.txt", 10000, max_cell=10000, min_reads=2000, output="bc.stats.txt")
This write a bc_stats.csv file (CSV) which contains:
barcode/counts/mismatch_reads/mismatch_bc/mutate_last_base
Barcode Split¶
-
baseq.drops.barcode.split.
getUMI
(protocol, barcode, seq1, mutate_last_base)[source]¶ Get the UMI from the raw read…
10X: 16-26 indrop: seq1[len(barcode) + 22:len(barcode) + 22 + 6] dropseq: 11-19/12-20
-
baseq.drops.barcode.split.
split
(name, protocol, bcstats, fq1, fq2, dir, topreads=10)[source]¶ Barcode split into 16 files according to the valid barcode in the bcstats files.
- Determine whether the last base mutates;
- Filter by whitelist;
- read barcode stats…
- build barcode correction table…
- build barcode_mutate_last list…
Usage:
from baseq.drops.barcode.split import split split("10X_1", "10X", "bc.stats.txt", "10X_1.1.fq.gz", "10X_1.2.fq.gz", "./10X_1", topreads=10)
- Result:
- The splitted reads will be write to XXXX/split.[AA…..GG].fa
Reads Alignment¶
Reads Tagging¶
-
baseq.drops.tag_gene.
tagging_reads
(genome, bam, outpath)[source]¶ Tagging reads will transform the genomic position of reads to the gene name Report the genes for a …
Alternative Poly Adenelation¶
-
baseq.drops.apa.samples.
APA_usage
(bamfile, APA_sitefile, celltype, gene)[source]¶ Get the abundance for each cell barcode for each APA in the gene.
Parameters: - bamfile – method for the new
Request
object. - APA_sitefile – URL for the new
Request
object. - celltype – (optional) The celltype file genreated from cellranger
- Usage:
>>> import requests >>> req = requests.request('GET', 'http://httpbin.org/get') <Response [200]>
- Returnsass:
- Generate a heatmap; Print the Read count;
- bamfile – method for the new
-
baseq.drops.apa.genes.
scan_genes
(genome, bam, name)[source]¶ Example function with types documented in the docstring.
- Args:
- param1 (int): The first parameter. param2 (str): The second parameter.
- Examples:
- Examples should be written in doctest format, and should illustrate how to use the function.
- Returns:
- bool: The return value. True for success, False otherwise.
-
baseq.drops.apa.UTR.
scan_utr
(genome, bam, name)[source]¶ For a genome, read the gencode annotationm get the logest UTR for each gene (>=1000bp) Apply the ‘scan’ function for each UTR (default 20 threads…) Call the peaks for each UTR. Build and Write the APA Peaks for all the genes.
-
baseq.drops.apa.samples.
APA_usage
(bamfile, APA_sitefile, celltype, gene)[source] Get the abundance for each cell barcode for each APA in the gene.
Parameters: - bamfile – method for the new
Request
object. - APA_sitefile – URL for the new
Request
object. - celltype – (optional) The celltype file genreated from cellranger
- Usage:
>>> import requests >>> req = requests.request('GET', 'http://httpbin.org/get') <Response [200]>
- Returnsass:
- Generate a heatmap; Print the Read count;
- bamfile – method for the new