Drops:¶
Design¶
The key question for Drops Pipeline design.
- Split the barcode;
- Tagging reads from genome position;
Todos¶
- 尚待完成
APIs¶
Extract, Count barcode¶
-
baseq.drops.barcode.count.
count_barcodes
(path, output, protocol, min_reads, topreads=100)[source]¶ Count thre number of Each barcode
Parameters: - path – fastq file.
- output – The stats will write to …
- protocol – Protocol
- min_reads – minimum reads
- topreads – process max N million reads
- Return:
- A barcode_count file will be generated. cellbarcode/counts
-
baseq.drops.barcode.count.
extract_barcode
(protocol, seq)[source]¶ Extract cell barcode from reads
- 10X: seq[0:16]
- indrop: seq[0:i] + seq[i + 22 : i + 22 + 8] (i is length of barcode 1)
- dropseq: seq[0:12]
Parameters: - protocol – 10X/indrop/drop-seq.
- seq – The sequence containing cellbarcode.
- Return:
- barcode: barcode, if no valid barcode, return “”
Correct, stats barcode¶
-
baseq.drops.barcode.stats.
valid_barcode
(protocol='', barcode_count='', max_cell=10000, min_reads=2000, output='./bc_stats.txt')[source]¶ Aggregate the mismatch barcode, get the total_reads;
- Read the barcode counts files;
- Correct the barcode with 1bp mismatch;
- Stats the mismatch barcode reads and sequences;
- Determine wheather mutate on the last base (show A/T/C/G with similar ratio at the last base);
- Filter by whitelist;
- Filter by read counts (>=min_reads);
- Print the number of barcode and reads retained after each steps.
Parameters: - protocol – 10X/Dropseq/inDrop.
- barcode_count – barcode_count.
- min_reads – Minimum number of reads for a cell.
- output – Path or name of output (./bc_stats.txt)
- Return:
- Write a bc_stats.csv file which contains: barcode/counts/mismatch_reads/mismatch_bc/mutate_last_base
Barcode Split¶
-
baseq.drops.barcode.split.
split_16
(name, protocol, bcstats, fq1, fq2, dir, topreads=10)[source]¶ Barcode split into 16 files according to the valid barcode in the bcstats files.
- Determine whether the last base mutates;
- Filter by whitelist;
Parameters: - protocol – 10X/Dropseq/inDrop.
- name – barcode_count.
- bcstats – Valid Barcode.
- output – (./bc_stats.txt)
- Return:
- The splitted reads will be write to XXXX/split.AA.fa
Reads Tagging¶
Alternative Poly Adenelation¶
-
baseq.drops.apa.samples.
APA_usage
(bamfile, APA_sitefile, celltype, gene)[source]¶ Get the abundance for each cell barcode for each APA in the gene.
Parameters: - bamfile – method for the new
Request
object. - APA_sitefile – URL for the new
Request
object. - celltype – (optional) The celltype file genreated from cellranger
- Usage:
>>> import requests >>> req = requests.request('GET', 'http://httpbin.org/get') <Response [200]>
- Returnsass:
- Generate a heatmap; Print the Read cou
- bamfile – method for the new
-
baseq.drops.apa.genes.
scan_genes
(genome, bam, name)[source]¶ Example function with types documented in the docstring.
- Args:
- param1 (int): The first parameter. param2 (str): The second parameter.
- Examples:
- Examples should be written in doctest format, and should illustrate how to use the function.
- Returns:
- bool: The return value. True for success, False otherwise.
-
baseq.drops.apa.UTR.
scan_utr
(genome, bam, name)[source]¶ For a genome, read the gencode annotationm get the logest UTR for each gene (>=1000bp) Apply the ‘scan’ function for each UTR (default 20 threads…) Call the peaks for each UTR. Build and Write the APA Peaks for all the genes.
-
baseq.drops.apa.samples.
APA_usage
(bamfile, APA_sitefile, celltype, gene)[source] Get the abundance for each cell barcode for each APA in the gene.
Parameters: - bamfile – method for the new
Request
object. - APA_sitefile – URL for the new
Request
object. - celltype – (optional) The celltype file genreated from cellranger
- Usage:
>>> import requests >>> req = requests.request('GET', 'http://httpbin.org/get') <Response [200]>
- Returnsass:
- Generate a heatmap; Print the Read cou
- bamfile – method for the new