stopwords-json

Stopwords for various languages in JSON format.

Downloads in past

Stats

StarsIssuesVersionUpdatedCreatedSize
stopwords-json
41771.2.07 years ago8 years agoMinified + gzip package size for stopwords-json in KB

Readme

stopwords-json Build Status npm Bower
Stopwords for various languages in JSON format. Per Wikipedia:
Stop words are words which are filtered out prior to, or after, processing of natural language data ... these are some of the most common, short function words, such as the, is, at, which, and on.

You can use all stopwords with stopwords-all.json (keyed by language ISO 639-1 code), or see the below table for individual language stopword files.

Languages

There are a total of 50 supported languages:
Language | Stopword count | Filename --- | --- | --- Afrikaans | 51 | af.json Arabic | 162 | ar.json Armenian | 45 | hy.json Basque | 98 | eu.json Bengali | 116 | bn.json Breton | 126 | br.json Bulgarian | 259 | bg.json Catalan | 218 | ca.json Chinese | 542 | zh.json Croatian | 179 | hr.json Czech | 346 | cs.json Danish | 101 | da.json Dutch | 275 | nl.json English | 570 | en.json Esperanto | 173 | eo.json Estonian | 35 | et.json Finnish | 772 | fi.json French | 606 | fr.json Galician | 160 | gl.json German | 596 | de.json Greek | 75 | el.json Hausa | 39 | ha.json Hebrew | 194 | he.json Hindi | 225 | hi.json Hungarian | 781 | hu.json Indonesian | 355 | id.json Irish | 109 | ga.json Italian | 619 | it.json Japanese | 109 | ja.json Korean | 679 | ko.json Latin | 49 | la.json Latvian | 161 | lv.json Marathi | 99 | mr.json Norwegian | 172 | no.json Persian | 332 | fa.json Polish | 260 | pl.json Portuguese | 408 | pt.json Romanian | 282 | ro.json Russian | 539 | ru.json Slovak | 110 | sk.json Slovenian | 446 | sl.json Somalia | 30 | so.json Southern Sotho | 31 | st.json Spanish | 577 | es.json Swahili | 74 | sw.json Swedish | 401 | sv.json Thai | 115 | th.json Turkish | 279 | tr.json Yoruba | 60 | yo.json Zulu | 29 | zu.json

Sources

License and Copyright

Copyright (c) 2017 Peter Graham, contributors. Released under the Apache-2.0 license.