collate¶
This command takes as input a directory containing a RAD file (created by running alevin with the --justAlign
and/or --sketch
flags), as well as the directory generated as the result of running the generate-permit-list
command of alevin-fry
, and it will produce an output RAD file that is collated by (corrected) cellular barcode. The collated RAD file can then be quantified with the alevin-fry
quant
command. It also takes two other arguments (described below) that dictate how the collation and filtering will be performed.
-r, --rad-dir <rad-dir>
: The directory containing the RAD file to be collated. This is the same directory on which you have previously rungenerate-permit-list
and that was obtained by runningalevin
with the--justAlign
flag).-i, --input-dir <input-dir>
: The input directory. This is the directory that was the output ofgenerate-permit-list
. This directory contains information computed by thegenerate-permit-list
command that will allow successful collation and barcode correction. This is also the directory where the collated RAD file will be output.--compress
: This optional flag will tellalevin-fry
to compress the output collated RAD file. The file will be compressed using the Snappy compression format (via the excellent snap crate. If this option is passed, the output file will be written tomap.collated.rad.sz
rather thanmap.collated.rad
, and the corresponding status of the file’s compression will be written tocollate.json
in the output file. Note: The choice to use compression or not has no effect on the final result or the correctness of the output, but it may have some moderate performance implications. Specifically, it is potentially worth using this flag if you want to minimize disk space, and if you are using a sufficiently large number of threads (as compression happens in parallel, a sufficient number of threads will allow the compressed RAD file to be generated as quickly as the uncompressed). However, because some internal buffers must be duplicated during parallel compression, the collate step can use a bit more memory if run with the--compress
flag, though the memory usage should still be small and stable over different sized inputs. There can also be an effect on quantification speed (since the collated RAD file will be decompressed on the fly during quantification), but it should be small since Snappy decompresses very fast, and decompression will only be the limiting factor if you are using a simple resolution strategy (e.g. naive or cr-like) and many quantification threads.-m, --max-records <max-records>
: The maximum number of read records to keep in memory at once during collation. Thecollate
command will pass over the input RAD file multiple times collecting the records associated with a set of (corrected) cellular barcodes so that they can be written out in collated format to the output RAD file. This parameter determines (approximately) how many records will be held in memory at once, and therefore determines the memory usage of thecollate
command. The larger the value used the faster the collation process will be, since fewer passes are made. The smaller this value, the lower the memory usage will be, at the cost of more passes. The default value is 30,000,000. Note that this determines the number of records approximately, because a specific barcode will never be split across multiple collation passes. The algorithm employed is to collect the reads associated with different cellular barcodes in the current pass until the number of reads to be collected first exceeds this value.
output¶
The collate
command will output all files it creates in the expected format in the output directory that is specified. It will write a file name map.collated.rad
(or map.collated.rad.sz
if run with the --compress
flag), one named unmapped_bc_count_collated.bin
, and one named collate.json
in the directory specified by -i
.