DropsRNA

Design

The key question for Drops Pipeline design.

  • Split the barcode;
  • Tagging reads from genome position;

Functions

Count barcode

baseq.drops.barcode.count.count_barcodes(path, output, protocol, min_reads, topreads=100)[source]

Count thre number of Each barcode

from baseq.drops.barcode.count import count_barcodes
count_barcodes("10X.1.fq.gz", "bc.counts.txt", "10X", min_reads=50, topreads=1000)

Return:

#A barcode_count file in csv: bc.counts.txt
cellbarcode, counts
baseq.drops.barcode.count.extract_barcode(protocol, seq)[source]

Extract cell barcode from reads

  • 10X: seq[0:16]
  • indrop: seq[0:i] + seq[i + 22 : i + 22 + 8] (i is length of barcode 1)
  • dropseq: seq[0:12]

Usage:

from baseq.drops.barcode.count import extract_barcode
extract_barcode("10X", "ATCGATCGATCGACTAAATTTTTTT")
Result:
barcode: barcode sequence, if no valid barcode, return “”

Correct & stats barcode

baseq.drops.whitelist.read_whitelist(protocol)[source]

Read Whitelist Get whitelist from config file: Drops/whitelistDir

baseq.drops.whitelist.whitelist_check(bc_white, protocol, barcode)[source]

Check whitelist…

baseq.drops.barcode.stats.valid_barcode(protocol='', barcode_count='', max_cell=10000, min_reads=2000, output='bc.stats.txt')[source]

Aggregate the mismatch barcode, get the total_reads;

  1. Read the barcode counts files;
  2. Correct the barcode with 1bp mismatch;
  3. Stats the mismatch barcode reads and sequences;
  4. Determine wheather mutate on the last base (show A/T/C/G with similar ratio at the last base);
  5. Filter by whitelist;
  6. Filter by read counts (>=min_reads);
  7. Print the number of barcode and reads retained after each steps.

Usage:

from baseq.drops.barcode.stats import valid_barcode
valid_barcode("10X", "bc.counts.txt", 10000,
    max_cell=10000, min_reads=2000, output="bc.stats.txt")

This write a bc_stats.csv file (CSV) which contains:

barcode/counts/mismatch_reads/mismatch_bc/mutate_last_base

Barcode Split

baseq.drops.barcode.split.getUMI(protocol, barcode, seq1, mutate_last_base)[source]

Get the UMI from the raw read…

10X: 16-26
indrop: seq1[len(barcode) + 22:len(barcode) + 22 + 6]
dropseq: 11-19/12-20
baseq.drops.barcode.split.split(name, protocol, bcstats, fq1, fq2, dir, topreads=10)[source]

Barcode split into 16 files according to the valid barcode in the bcstats files.

  • Determine whether the last base mutates;
  • Filter by whitelist;
  • read barcode stats…
  • build barcode correction table…
  • build barcode_mutate_last list…

Usage:

from baseq.drops.barcode.split import split
split("10X_1", "10X", "bc.stats.txt", "10X_1.1.fq.gz", "10X_1.2.fq.gz", "./10X_1", topreads=10)
Result:
The splitted reads will be write to XXXX/split.[AA…..GG].fa

Reads Alignment

baseq.drops.run_star.run_multiple(bc_dir, workdir, sample, genome, parallel)[source]

Run Star Alignments…. Uasge:

run_star_multiple(bc_dir, workdir, sample, genome, parallel=4)

Results:

Aligned.AA.bam

Reads Tagging

baseq.drops.tag_gene.tagging_reads(genome, bam, outpath)[source]

Tagging reads will transform the genomic position of reads to the gene name Report the genes for a …

Alternative Poly Adenelation

baseq.drops.apa.scaner.scan(bam, name, chr, start, end, min_depth=10)[source]

SCAN THE GENOME…

baseq.drops.apa.samples.APA_usage(bamfile, APA_sitefile, celltype, gene)[source]

Get the abundance for each cell barcode for each APA in the gene.

Parameters:
  • bamfile – method for the new Request object.
  • APA_sitefile – URL for the new Request object.
  • celltype – (optional) The celltype file genreated from cellranger
Usage:
>>> import requests
>>> req = requests.request('GET', 'http://httpbin.org/get')
<Response [200]>
Returnsass:
Generate a heatmap; Print the Read count;
baseq.drops.apa.genes.scan_genes(genome, bam, name)[source]

Example function with types documented in the docstring.

Args:
param1 (int): The first parameter. param2 (str): The second parameter.
Examples:
Examples should be written in doctest format, and should illustrate how to use the function.
Returns:
bool: The return value. True for success, False otherwise.
baseq.drops.apa.UTR.scan_utr(genome, bam, name)[source]

For a genome, read the gencode annotationm get the logest UTR for each gene (>=1000bp) Apply the ‘scan’ function for each UTR (default 20 threads…) Call the peaks for each UTR. Build and Write the APA Peaks for all the genes.

baseq.drops.apa.samples.APA_usage(bamfile, APA_sitefile, celltype, gene)[source]

Get the abundance for each cell barcode for each APA in the gene.

Parameters:
  • bamfile – method for the new Request object.
  • APA_sitefile – URL for the new Request object.
  • celltype – (optional) The celltype file genreated from cellranger
Usage:
>>> import requests
>>> req = requests.request('GET', 'http://httpbin.org/get')
<Response [200]>
Returnsass:
Generate a heatmap; Print the Read count;