Normalize and clean text

Downloads in past


0.2.08 years ago10 years agoMinified + gzip package size for node-normalizer in KB


Normalize, clean and fix text
npm install node-normalizer
The simple app processes input and tries to make it consumable for a bot.
The order in which the processing happes is important.
  • spelling corrections for common spelling errors
  • idiom conversions
  • junk word removal from sentence
  • special sentence effects (question, exclamation, revert question)
  • abbreviation expansion and canonization
  • for abbreviations, do not use before the .
  • for apostrophied left side, must follow tokenizing conventions
  • for apostrophied right side, it means do not spell check the word, the apostrophe will disappear
  • Format is left phrase separated by yields right phrase separated by +
  • if right side is %value means set that bit on the sentence (%EXCLAMATIONMARK %QUESTIONMARK)
  • if right side is a ~word its an interjection
  • only proper names should have capital letters
  • Right phrase missing means delete left phrase
  • Substitutions files include:
  • we use + because we dont want the resulting phrase recognized by the idiom processor and thus cause the processor to delete the phrase
  • xxx> means sentence then end stop
  • if you want to have the result NOT tokenized, put it in quotes