Safely remove repeating whitespace from HTML text.
Using
\s
to normalize HTML whitespace will strip out characters that are actually rendered by a web browser. Such would be classified as a lossy change and would produce a different visual result. This package will collapse multiple whitespace characters down to a single space, while ignoring the following characters:\u00a0
or\ufeff
or
(zero-width non-breaking space)
…as well as these lesser-known ones:
\u1680
or\u180e
or
(Mongolian vowel separator)\u2000
or\u2001
or\u2002
or\u2003
or\u2004
or\u2005
or\u2006
or\u2007
or\u2008
or\u2009
or\u200a
or\u2028
or\u2029
or\u202f
or\u205f
or\u3000
or
For the sake of completeness, the following characters which are not part of
\s
will also not be affected:\u200b
or
(zero-width breaking space)
Note: this package does not contain an HTML parser. It is meant to be used on text nodes only.
Installation
Node.js>= 8
is required. Type this at the command line:
npm install normalize-html-whitespace
Usage
const normalizeWhitespace = require('normalize-html-whitespace');
normalizeWhitespace(' foo bar baz ');
//-> ' foo bar baz '