normalize-html-whitespace

Safely remove repeating whitespace from HTML text.

Downloads in past

Stats

StarsIssuesVersionUpdatedCreatedSize
normalize-html-whitespace
1211.0.05 years ago8 years agoMinified + gzip package size for normalize-html-whitespace in KB

Readme

Safely remove repeating whitespace from HTML text.

Using \s to normalize HTML whitespace will strip out characters that are actually rendered by a web browser. Such would be classified as a lossy change and would produce a different visual result. This package will collapse multiple whitespace characters down to a single space, while ignoring the following characters:
  • \u00a0 or   (non-breaking space)
  • \ufeff or  (zero-width non-breaking space)

…as well as these lesser-known ones:
  • \u1680​ or (Ogham space mark)
  • \u180e or (Mongolian vowel separator)
  • \u2000​ or   (en quad)
  • \u2001 or (em quad)
  • \u2002 or (en space)
  • \u2003 or (em space)
  • \u2004 or (three-per-em space)
  • \u2005 or (four-per-em space)
  • \u2006 or (six-per-em space)
  • \u2007 or (figure space)
  • \u2008 or (punctuation space)
  • \u2009 or (thin space)
  • \u200a or (hair space)
  • \u2028 or (line separator)
  • \u2029 or (paragraph separator)
  • \u202f or (narrow non-breaking space)
  • \u205f or (medium mathematical space)
  • \u3000 or   (ideographic space)

For the sake of completeness, the following characters which are not part of \s will also not be affected:
  • \u200b or (zero-width breaking space)

Note: this package does not contain an HTML parser. It is meant to be used on text nodes only.

Installation

Node.js >= 8 is required. Type this at the command line:
npm install normalize-html-whitespace

Usage

const normalizeWhitespace = require('normalize-html-whitespace');

normalizeWhitespace('  foo bar     baz ');
//-> ' foo bar baz '