"--insecure"
"--no-check-certificate"
"-o"
"--output"
"--output-document"
=======================
options (proxy server):
=======================
Note:
When options allow for an environment variable
to serve as an alternative method to specify a value,
the command-line parameter always takes preference.
"--no-proxy"
Specify that no requests are tunneled through any proxy server.
"-x"
"--proxy"
Specify the connection URL for a proxy server.
HTTP and HTTPS requests tunnel through the same proxy server.
Format of URL:
<protocol>://<auth>@<host>:<port>
Where:
<protocol> is any of the following values:
'http','https','socks','socks4','socks4a','socks5','socks5h'
<auth> is "<username>:<password>" in clear text,
which is used to send a Basic 'Proxy-Authorization' header
<host> is required.
Defaults:
<protocol> = 'http'
<port> = 80 (http), 443 (https), 1080 (socks*)
Notes:
<auth> is URL decoded to allow for special characters.
For example, a password is allowed to contain the '@' character.
If this character is not URL encoded, then the URL is broken:
socks5://username:p@ssword@host
The correct connection URL is:
socks5://username:p%40ssword@host
Environment variable:
`proxy`
"--proxy-http"
Specify the connection URL for a proxy server.
HTTP requests tunnel through the proxy server.
Environment variable:
`http_proxy`
"--proxy-https"
Specify the connection URL for a proxy server.
HTTPS requests tunnel through the proxy server.
Environment variable:
`https_proxy`
======================
options (web crawler):
======================
"--spider"
This option is a convenience aggregate, which is equivalent to:
--mirror --server-response --dry-run
"-m"
"--mirror"
This option is a convenience aggregate, which is equivalent to:
-r -l 0 --trust-server-names -E -k
"-r"
"--recursive"
Enable recursive website crawling.
When recursively crawling a website:
1. URLs that match a blacklist are not followed
2. URLs that match a whitelist are followed
3. When there is a whitelist, non-matching URLs are not followed
4. When there is no whitelist, same-host URLs are followed
All followed URLs are:
1. mirrored to the local file system
When the content-type of a followed URL is HTML or CSS:
1. the content of the response is inspected for embedded URLs
2. all embedded URLs are compared to white/black lists
3. all embedded URLs that will be followed
are rewritten as relative links to the local file system
4. all embedded URLs that will not be followed
are rewritten as absolute links to the remote host
"-l"
"--level"
Specify the maximum depth for recursion.
Depth is counted as the number of "hops" from the original URL.
The default is 5, which is consistent with Wget.
For example:
--level 1
will conditionally download
only the links in the target webpage.
Special case:
--level 0
indicates infinite recursion.
"-p"
"--page-requisites"
When either:
1. recursively crawling a website with a finite maximum depth
2. downloading a single webpage without recursive crawling,
which for the purpose of this discussion can be thought of
as equivalent to a recursive crawl
with a maximum depth that is truly equal to zero
HTML documents that are downloaded at the maximum recursion depth
will only contain absolute remote URLs to all page resources;
all such page resources require deeper recursion than allowed.
The purpose for this option is to allow one extra "hop"
for all URLs that don't return HTML content.
This option pairs well with: "--force-html"
"-E"
"--adjust-extension"
Force that the filenames used to save HTML content
always end with an ".html" extension.
This option pairs well with: "--force-html"
WARNING:
- Unlike Wget, which uses a 2x-pass methodology,
Nget only uses a 1x-pass strategy,
which requires a deterministic way to identify
that a URL will return HTML content.
- This restriction does not apply to the actual
crawling of webpages; content-type of the server response
is used to identify HTML content for this purpose.
- This restriction does apply to the determination
of filenames, and is especially important with respect
to the ability to adjust the extension of filenames.
- The reason for this restriction is that the determination
of filenames must occur at a lower recursion depth,
while the URLs in HTML and CSS documents are identified,
and conditionally rewritten as relative links
to the local file system.
"-k"
"--convert-links"
Rewrite the URLs in HTML and CSS documents
that have been followed and mirrored,
as relative links to the local file system.
By default, all such URLs are rewritten
as absolute links to the remote host.
"-np"
"--no-parent"
Blacklist all requests with a pathname that does not descend
from the absolute directory path in the original URL pathname.
This rule only applies to URLs requested from the same host.
This option pairs well with: "--span-subdomains"
This option pairs well with: "--no-host-directories", "--cut-dirs"
"-xD"
"--exclude"
"--exclude-directory"
Blacklist an absolute directory path that applies only to the host
associated with the original URL for the target webpage.
No URLs that are a descendant of this directory are followed.
This flag can be repeated to blacklist multiple directory paths.
WARNING:
This option uses a non-standard alias.
Wget uses the alias "-X" as an alias for "--exclude"
Curl uses the alias "-X" as an alias for "--method"
The alias is allocated for compatability with Curl,
because "--method" is used more frequently.
"-iD"
"--include"
"--include-directory"
Whitelist an absolute directory path that applies only to the host
associated with the original URL for the target webpage.
All URLs that are a descendant of this directory are followed.
This flag can be repeated to whitelist multiple directory paths.
WARNING:
This option uses a non-standard alias.
Wget uses the alias "-I" as an alias for "--include"
Curl uses the alias "-I" as an alias for "--head"
The alias is allocated for compatability with Curl,
because "--head" is used more frequently.
"-sD"
"--span-subdomains"
Conditionally follow URLs hosted by any subdomain
that shares the same 2x top level domain names
as the original URL for the target webpage.
For example:
URL = "http://www1.example.com/foo.html"
follows = "http://www2.example.com/bar.html"
"-sH"
"--span-hosts"
Conditionally follow URLs hosted by any domain.
This option pairs well with: "--include-host", "--exclude-host"
This option pairs well with: "--accept-regex", "--reject-regex"
WARNING:
This option uses a non-standard alias.
Wget uses the alias "-H" as an alias for "--span-hosts"
Curl uses the alias "-H" as an alias for "--header"
The alias is allocated for compatability with Curl,
because "--header" is used more frequently.
"-xH"
"--exclude-domains"
"--exclude-host"
Blacklist a case-insensitive host name.
When "--span-subdomains" is enabled:
Host names are normalized to only contain the 2x top domains.
This option pairs well with: "--span-hosts"
This flag can be repeated to blacklist multiple hosts.
"-iH"
"-D"
"--domains"
"--include-host"
Whitelist a case-insensitive host name.
When "--span-subdomains" is enabled:
Host names are normalized to only contain the 2x top domains.
This option pairs well with: "--span-hosts"
This flag can be repeated to whitelist multiple hosts.
"--reject-regex"
Specify a case-insensitive PCRE regex pattern
to blacklist absolute URLs.
This option pairs well with: "--span-hosts"
This flag can be repeated to blacklist multiple URL patterns.
"--accept-regex"
Specify a case-insensitive PCRE regex pattern
to whitelist absolute URLs.
This option pairs well with: "--span-hosts"
This flag can be repeated to whitelist multiple URL patterns.
"-nd"
"--no-directories"
Disable the creation of a directory tree hierarchy.
Save all file downloads to a single output directory.
By default, crawling produces a directory structure
which mirrors that of the remote server.
This option pairs well with: "--directory-prefix"
"--protocol-directories"
Include top-level subdirectories in the resulting directory tree
named for the network protocol with which they were crawled.
For example: "http/host/path/file" or "https/host/path/file"
This option is nullified by: "--no-directories"
"-nH"
"--no-host-directories"
Exclude top-level subdirectories in the resulting directory tree
named for the host from which they were crawled.
For example: "path/file"
This option is nullified by: "--no-directories"
"--cut-dirs"
Exclude the top n-levels of subdirectories from the URL pathname.
For example: --cut-dirs 3
URL = "http://example.com/1/2/3/4/5/index.html"
default = "example.com/1/2/3/4/5/index.html"
filepath = "example.com/4/5/index.html"
This option is nullified by: "--no-directories"
"-F"
"--force-html"
Specify a case-insensitive PCRE regex pattern
to match absolute URLs.
Matching URLs are:
1. inspected for embedded URLs
2. given an ".html" file extension by "--adjust-extension"
When not configured, a default pattern matches many common cgi.
This flag can be repeated to match multiple URL patterns.
"-B"
"--base"
Specify a <base href="URL"> that is used only to resolve
relative links extracted from the target webpage.
More formally, when the depth of recursion is exactly zero.
```
```bash
nget-convert-cookiefile --json-to-text --in --out
nget-convert-cookiefile --text-to-json --in --out
========
options:
========
"-h"
"--help"
Print a help message describing all of Nget's command-line options.
"-V"
"--version"
Display the version of Nget.
"--json-to-text"
Enable the conversion operation: JSON to Netscape text format
"--text-to-json"
Enable the conversion operation: Netscape text format to JSON
"--in"
Specify path to input file.
"--out"
Specify path to output file.
```
Examples:
This test script is a thorough reference.
The following examples should serve as a quick reference:
```bash
nget --max-concurrency 4 --chunk-size 10 \
--url 'https://archive.org/download/BigBuckBunny124/Content/bigbuckbunny720psurround.mp4'
nget --max-concurrency 10 \
--no-check-certificate -nc --content-disposition \
-i '/path/to/input-urls.list' \
-P '/path/to/output-directory'
nget --url 'http://httpbin.org/post' --method POST \
--post-file '/path/to/file1.input' \
-O '/path/to/file1.output'
cat '/path/to/file1.input' | \
nget --url 'http://httpbin.org/post' --method POST \
--post-data 'textencoded={{+ value to urlencode}}&textdecoded={{- value%20to%20urldecode}}&binarystdin={{@ -}}&binaryfile={{@ /path/to/file2.input}}' \
-U 'nget' --header 'accept: application/json' \
-O '-' >'/path/to/file2.output'
cat '/path/to/file1.input' | \
nget --url 'http://httpbin.org/post' --method POST \
--post-data 'textencoded={{+ value to urlencode}}&textdecoded={{- value%20to%20urldecode}}&binarystdin={{@ - file1.input | text/plain}}&binaryfile={{@ /path/to/file2.input | text/plain}}' \
-U 'nget' --header 'accept: application/json' \
-O '-' >'/path/to/file3.output'
nget --mirror --url 'https://hexdocs.pm/crawler/1.1.2/api-reference.html' --no-parent \
-P '/path/to/output-directory'
nget -O '-' \
--proxy 'socks5://user:pass@proxy.example.com:1080' \
--url 'http://ipv4.ipleak.net/json/'
```
Usage as an Embedded Library:
- without cluttering the README with too much technical info, I will quickly mention that:
all functionality can be imported as a function:
const {download} = require('@warren-bank/node-request-cli')
all command-line options can be specified at runtime in a configuration object passed to the function:
download({})
Requirements:
- Node version: v6.4.0 (and higher)
ES6 support
* v0.12.18+: Promise
* v4.08.03+: Object shorthand methods
* v5.12.00+: spread operator
* v6.04.00+: Proxy constructor
* v6.04.00+: Proxy 'apply' handler
* v6.04.00+: Reflect.apply
tested in:
* v7.9.0
Legal: