s3-parallel-get

Get files from s3, in parallel

Downloads in past

Stats

StarsIssuesVersionUpdatedCreatedSize
s3-parallel-get
1020.2.29 years ago9 years agoMinified + gzip package size for s3-parallel-get in KB

Readme

s3cmd is great, but too slow for downloading lots of files from s3. Instead, s3parallel-get lets you download many files concurrently, meaning you get your files quicker.
Downloading 150MB of log files split over 150 files,
s3cmd took around 100 seconds, whereas s3-parallel-get took just under 25 seconds.

Installation

Just a standard npm install,
npm install -g s3-parallel-get

Usage

export AWS_KEY=<key>
export AWS_SECRET=<secret>

s3get --bucket <bucket> --prefix <path/to/files>

s3
parallel-get
was built for fetching log files, so output is sent to stdout for piping in to other commands.
Progress is printed to stderr,
found 4 objects in 'logs/production/access.log/2014/02/22'
retrieved object #2 - PROD/access.log/2014/02/22/access.i-13e5f250-b
retrieved object #4 - PROD/access.log/2014/02/22/access.i-13e5f250-d
retrieved object #3 - PROD/access.log/2014/02/22/access.i-13e5f250-c
retrieved object #1 - PROD/access.log/2014/02/22/access.i-13e5f250-a
done

Programmatic

var s3get = require('../lib/aws-parallel-get').s3get

// Set some options
var opts = {
    bucket: program.bucket,
    prefix: program.prefix,
    key: program.key,
    secret: program.secret
});

var s3 = new s3get(opts)

// write a handler for the s3 stream
s3.on('data', function (data) {
    process.stdout.write("*" + data)
})  

// kick it off
s3.go();