This command takes as input a directory containing a RAD file (created by running alevin with the
--sketch flags), as well as the directory generated as the result of running the
generate-permit-list command of
alevin-fry, and it will produce an output RAD file that is collated by (corrected) cellular barcode. The collated RAD file can then be quantified with the
quant command. It also takes two other arguments (described below) that dictate how the collation and filtering will be performed.
-r, --rad-dir <rad-dir>: The directory containing the RAD file to be collated. This is the same directory on which you have previously run
generate-permit-listand that was obtained by running
-i, --input-dir <input-dir>: The input directory. This is the directory that was the output of
generate-permit-list. This directory contains information computed by the
generate-permit-listcommand that will allow successful collation and barcode correction. This is also the directory where the collated RAD file will be output.
--compress: This optional flag will tell
alevin-fryto compress the output collated RAD file. The file will be compressed using the Snappy compression format (via the excellent snap crate. If this option is passed, the output file will be written to
map.collated.rad, and the corresponding status of the file’s compression will be written to
collate.jsonin the output file. Note: The choice to use compression or not has no effect on the final result or the correctness of the output, but it may have some moderate performance implications. Specifically, it is potentially worth using this flag if you want to minimize disk space, and if you are using a sufficiently large number of threads (as compression happens in parallel, a sufficient number of threads will allow the compressed RAD file to be generated as quickly as the uncompressed). However, because some internal buffers must be duplicated during parallel compression, the collate step can use a bit more memory if run with the
--compressflag, though the memory usage should still be small and stable over different sized inputs. There can also be an effect on quantification speed (since the collated RAD file will be decompressed on the fly during quantification), but it should be small since Snappy decompresses very fast, and decompression will only be the limiting factor if you are using a simple resolution strategy (e.g. naive or cr-like) and many quantification threads.
-m, --max-records <max-records>: The maximum number of read records to keep in memory at once during collation. The
collatecommand will pass over the input RAD file multiple times collecting the records associated with a set of (corrected) cellular barcodes so that they can be written out in collated format to the output RAD file. This parameter determines (approximately) how many records will be held in memory at once, and therefore determines the memory usage of the
collatecommand. The larger the value used the faster the collation process will be, since fewer passes are made. The smaller this value, the lower the memory usage will be, at the cost of more passes. The default value is 30,000,000. Note that this determines the number of records approximately, because a specific barcode will never be split across multiple collation passes. The algorithm employed is to collect the reads associated with different cellular barcodes in the current pass until the number of reads to be collected first exceeds this value.
collate command will output all files it creates in the expected format in the output directory that is specified. It will write a file name
map.collated.rad.sz if run with the
--compress flag), one named
unmapped_bc_count_collated.bin, and one named
collate.json in the directory specified by