SQD
executing unix commands with multi processesinstallation
$ npm install -g sqd
usage
$ sqd -c command [--debug] [--exit] [-p nProcess] [-s separator_command] <input file> [output file]
grep with 8 processes
sqd -c "grep -e something" -p 8 input.txt
results are on STDOUT.sed with 4 processes (default), results to output.txt
sqd -c "sed -e y/ATCG/atcg/" input.txt output.txt
with separator option, we can also handle binary files
sqd -c "samtools view -" -s bam input.bam
reducing
sqd -c "node foobar.js" sample.txt --reduce
in foobar.js
if (process.env.sqd_map) {
process.stdin.on("data", function(data) {
// do something, separated into multi processes
});
}
else if (process.env.sqd_reduce) {
process.stdin.on("data", function(data) {
// do somothing which reduces the results
});
}
process.stdin.resume()
options
- -p: the number of processes
- --debug: debug mode (showing time, temporary files)
- --exit: exits when child processes emit an error or emit to stderr
- -s: (see separator section)
- --reduce: reducing the results with the same command, which is given an environmental variable named sqdreduce with value "1"
additional environment variables in child processes
Be careful that all values are parsed as string.- sqdn: process number named by sqd, differs among child processes
- sqdstart: start position of the file passed to the child process
- sqdend: end position of the file passed to the child process
- sqdcommand: command string (common)
- sqdinput: input file name (common)
- sqdtmpfile: path to the tmpfile used in the child process
- sqddebug: debug mode or not. '1' or '0' (common)
- sqdhStart:: start position of the header (common)
- sqdmap:: "1", unless it is spawned for reducing. undefined, otherwise
- sqdreduce:: "1", if it is spawned for reducing. undefined, otherwise
separator
sqd requires a separator which separates a given input file into multiple chunks. separator offers the way how sqd separates the file by JSON format.the JSON keys are
- positions: start positions of each chunks in the file
"positions": [133, 271, 461, 631]
- header: range of the header section of the file, null when there is no header section
"header": [0, 133]
- size: file size (optional)
"size": 34503
available separators
- -s line: default. separate by line
- -s bam: see SAM/BAM format specification
- -s fastq: see fastq format
sqdm --much more memory
$ sqdm [memory=4000MB] -c command [--debug] [--exit] [-p nProcess] [-s separator_command] <input file> [output file]
sqd with 8000MB(≒8GB) memory
sqdm 8000 -c "cat" sample.txt