en-pos

A better English POS tagger written in JavaScript

  • en-pos

Downloads in past

Stats

StarsIssuesVersionUpdatedCreatedSize
en-pos
4951.0.166 years ago6 years agoMinified + gzip package size for en-pos in KB

Readme

en-pos
A better English POS tagger written in JavaScript

Installation and usage

Install via NPM: ``` npm i --save en-pos ``` How to use ```javascript const Tag = require("en-pos").Tag; var tags = new Tag("this","is","my","sentence") .initial() // initial dictionary and pattern based tagging .smooth() // further context based smoothing .tags; console.log(tags); // "DT","VBZ","PRP$","NN" ```

Annotation Specification

Annotation | Name | Example --- | --- | --- NN | Noun | dog man NNS | Plural noun | dogs men NNP | Proper noun | London Alex NNPS | Plural proper noun | Smiths VB | Base form verb | be VBP | Present form verb | throw VBZ | Present form (3rd person) | throws VBG | Gerund form verb | throwing VBD | Past tense verb | threw VBN | Past participle verb | thrown MD | Modal verb | can shall will may must ought JJ | Adjective | big fast JJR | Comparative adjective | bigger JJS | Superlative adjective | biggest RB | Adverb | not quickly closely RBR | Comparative adverb | less-closely faster RBS | Superlative adverb | fastest DT | Determiner | the a some both PDT | Predeterminer | all quite PRP | Personal Pronoun | I you he she PRP$ | Possessive Pronoun | I you he she POS | Possessive ending | 's IN | Preposition | of by in PR | Particle | up off TO | to | to WDT | Wh-determiner | which that whatever whichever WP | Wh-pronoun | who whoever whom what WP$ | Wh-possessive | whose WRB | Wh-adverb | how where EX | Expletive there | there CC | Coordinating conjugation | & and nor or CD | Cardinal Numbers | 1 7 77 one LS | List item marker | 1 B C One UH | Interjection | ah oh oops FW | Foreign Words | viva mon toujours , | Comma | , : |Mid-sent punct | : ; ... . | Sent-final punct. | . ! ? ( | Left parenthesis | ) } ] ) | Right parenthesis | ( { [ # | Pound sign | # $ | Currency symbols | $ £ ¥ SYM | Other symbols | + * / < > EM | Emojis & emoticons | :)

Accuracy and performance

TL:DR;

  • When smoothing is enabled: 96.43% accuracy (processing 132K tokens in 38 seconds)
  • When smoothing is disabled: 94.4% accuracy (processing 132K tokens in 3 seconds)
---- As of 25 Jan 2017, this library scored 96.43% at the Penn Treebank test (0.3% away from being a state of the art tagger)). Being written in JavaScript, I think it's safe to say that this is the most accurate JavaScript POS tagger, since the only JS library I know of is pos-js which when I tested on the same treebank scored 87.8%, though it was faster than my implementation when smoothing is enabled. However, if performance is what's you're after rather than accuracy, then you have the option to disable smoothing in this library and this will marginally increase performance making this library even faster than pos-js but with far better accuracy (94.4%).

Building from source and testing

  • Build: tsc (requires typescript)
  • Test: node test/test.ts

Credits

  • This project is an optimization and (almost complete) re-writing of Compendium's POS tagger.
  • Compendium's tagger itself was based on fasttag.