Drops:

Design

The key question for Drops Pipeline design.

  • Split the barcode;
  • Tagging reads from genome position;

Todos

  • 尚待完成

APIs

Extract, Count barcode

baseq.drops.barcode.count.count_barcodes(path, output, protocol, min_reads, topreads=100)[source]

Count thre number of Each barcode

Parameters:
  • path – fastq file.
  • output – The stats will write to …
  • protocol – Protocol
  • min_reads – minimum reads
  • topreads – process max N million reads
Return:
A barcode_count file will be generated. cellbarcode/counts
baseq.drops.barcode.count.extract_barcode(protocol, seq)[source]

Extract cell barcode from reads

  • 10X: seq[0:16]
  • indrop: seq[0:i] + seq[i + 22 : i + 22 + 8] (i is length of barcode 1)
  • dropseq: seq[0:12]
Parameters:
  • protocol – 10X/indrop/drop-seq.
  • seq – The sequence containing cellbarcode.
Return:
barcode: barcode, if no valid barcode, return “”

Correct, stats barcode

baseq.drops.barcode.stats.valid_barcode(protocol='', barcode_count='', max_cell=10000, min_reads=2000, output='./bc_stats.txt')[source]

Aggregate the mismatch barcode, get the total_reads;

  1. Read the barcode counts files;
  2. Correct the barcode with 1bp mismatch;
  3. Stats the mismatch barcode reads and sequences;
  4. Determine wheather mutate on the last base (show A/T/C/G with similar ratio at the last base);
  5. Filter by whitelist;
  6. Filter by read counts (>=min_reads);
  7. Print the number of barcode and reads retained after each steps.
Parameters:
  • protocol – 10X/Dropseq/inDrop.
  • barcode_count – barcode_count.
  • min_reads – Minimum number of reads for a cell.
  • output – Path or name of output (./bc_stats.txt)
Return:
Write a bc_stats.csv file which contains: barcode/counts/mismatch_reads/mismatch_bc/mutate_last_base

Barcode Split

baseq.drops.barcode.split.split_16(name, protocol, bcstats, fq1, fq2, dir, topreads=10)[source]

Barcode split into 16 files according to the valid barcode in the bcstats files.

  1. Determine whether the last base mutates;
  2. Filter by whitelist;
Parameters:
  • protocol – 10X/Dropseq/inDrop.
  • name – barcode_count.
  • bcstats – Valid Barcode.
  • output – (./bc_stats.txt)
Return:
The splitted reads will be write to XXXX/split.AA.fa

Reads Tagging

Alternative Poly Adenelation

baseq.drops.apa.scaner.scan(bam, name, chr, start, end, min_depth=10)[source]

SCAN THE GENOME…

baseq.drops.apa.samples.APA_usage(bamfile, APA_sitefile, celltype, gene)[source]

Get the abundance for each cell barcode for each APA in the gene.

Parameters:
  • bamfile – method for the new Request object.
  • APA_sitefile – URL for the new Request object.
  • celltype – (optional) The celltype file genreated from cellranger
Usage:
>>> import requests
>>> req = requests.request('GET', 'http://httpbin.org/get')
<Response [200]>
Returnsass:
Generate a heatmap; Print the Read cou
baseq.drops.apa.genes.scan_genes(genome, bam, name)[source]

Example function with types documented in the docstring.

Args:
param1 (int): The first parameter. param2 (str): The second parameter.
Examples:
Examples should be written in doctest format, and should illustrate how to use the function.
Returns:
bool: The return value. True for success, False otherwise.
baseq.drops.apa.UTR.scan_utr(genome, bam, name)[source]

For a genome, read the gencode annotationm get the logest UTR for each gene (>=1000bp) Apply the ‘scan’ function for each UTR (default 20 threads…) Call the peaks for each UTR. Build and Write the APA Peaks for all the genes.

baseq.drops.apa.samples.APA_usage(bamfile, APA_sitefile, celltype, gene)[source]

Get the abundance for each cell barcode for each APA in the gene.

Parameters:
  • bamfile – method for the new Request object.
  • APA_sitefile – URL for the new Request object.
  • celltype – (optional) The celltype file genreated from cellranger
Usage:
>>> import requests
>>> req = requests.request('GET', 'http://httpbin.org/get')
<Response [200]>
Returnsass:
Generate a heatmap; Print the Read cou