html-article-extractor

A web page content extractor for News websites

Downloads in past

Stats

StarsIssuesVersionUpdatedCreatedSize
html-article-extractor
2111.0.145 years ago5 years agoMinified + gzip package size for html-article-extractor in KB

Readme

html-article-extractor
A web page content extractor for News websites
installation
npm install html-article-extractor
usage
var htmlArticleExtractor = require("html-article-extractor");

var dom = new JSDOM("...");
var body = dom.window.document.body
result = htmlArticleExtractor(body);
console.log(result)

Outputs:
{
    html: '<div>contents</div>',
    text: 'contents'
}
example
git clone https://github.com/jungyoun/html-article-extractor
cd html-article-extractor
npm install
node example/crawler.js
demo
https://online-article-extractor.herokuapp.com/