atac¶
The atac
command exposes the functionality of alevin-fry
for processing RAD files containing scATAC-seq data. The atac
command sets the mode of alevin-fry
, and this command itself takes one of several various sub-commands (generate-permit-list
and sort
being the primary ones).
generate-permit-list (atac)¶
This command takes as input an output directory containing a RAD file (created by piscem
), and it determines what cell barcodes should be associated with “true” cells, which should be corrected to
some “true” barcode, and which should simply be ignored / discarded.
This command has 4 required arguments; the path to an input directory --input
,
the path to an output directory --output-dir
(which will be created if it
doesn’t exist), and a path to the barcode permit-list file. The functioning of this argument is as follows:
--unfiltered-pl <plist>
: This option accepts as an argument a list of possible barcodes for the sample. For example, this is the flag you should use if you wish to provide an “external permit list”, like the 10x v2 or 10x v3 permit lists. Unilike with the--valid-bc
flag, the list passed to this argument is the set of all possible barcodes for the technology being processed, and it is likely that most of the barcodes in the file may not correspond to cells present in this particular sample. When using this argument, you may also pass the--min-reads
argument to determine the minimum frequency with which a barcode must be seen in order to be retained. The algorithm used here will pass over the input records (mapped reads) and count how many times each of the barcodes in the unfiltered permit list occur exactly. Any barcode ocurring >=min-reads
times will be considered as a present cell. Subsequently, all barcodes that did not match a present cell will be searched (at an edit distance of up to 1) againt the barcodes determined to correspond to present cells. If an initially non-matching barcode has a unique neighbor among the barcodes for present cells, it will be corrected to that barcode, but if it has no 1-edit neighbor, or if it has 2 or more 1-edit neighbors among that list (i.e. it’s correction would be ambiguous), then the record is discarded.
output¶
The generate-permit-list
command outputs a number of different files in the output directory. Not all files are relevant to users of alevin-fry
, but the files are described here.
The file
bin_lens.bin
is a binary file that records the lengths of the bins used for creating temporary files for sorting.The file
bin_recs.bin
is a binary file that encodes where records should be routed during the sorting phase.The file
permit_freq.bin
is a binary file that encodes information about the frequency of occurrence of different barcodes in the permit list.The file
permit_map.bin
is a binary file (a serde serialized HashMap) that maps each barcode in the input RAD file that is within an edit distance of 1 to some true barcode to the barcode to which it corrects. This allows thecollate
command to group together all of the read records corresponding to the same corrected barcode.
The file
generate_permit_list.json
that is a JSON file containing information about the run of the command.
sort (atac)¶
This command takes as input the directory containing the original RAD file (created by piscem
) and the output directory generated by the generate-permit-list
command above. It parses the input RAD file, buckets and then sorts the records by genomic location, and produces a globally-sorted BED file for downstream analysis. The process is highly multi-threaded, and the number of threads can be chosen by passing the appropriate argument to the --threads
command. The output BED file can optionally be compressed if the --compress
flag is passed to the sort
command. The output of the sort
command id described below.
output¶
The sort
command outputs the following files:
The
sort.json
file is a JSON file containing information about how thesort
command was run.The
map.bed
file (ormap.bed.gz
if the--compress
flag was passed) contains the output filed in BED format that can be provided to a peak caller like MACS.