Extract data from the DOM using a JSON config

Downloads in past


471.2.32 years ago6 years agoMinified + gzip package size for html-extract-data in KB


Travis npm npm
Extract data from the DOM using a JSON config


yarn add html-extract-data

npm i -S html-extract-data



import extractFromHTML from 'html-extract-data';

  html, // a HTML DOM element
    query: '.grid-item',
    data: {
      title: 'h2',
      description: { query: 'p', html: true },

// Output:
  title: 'title',
  description: 'description <b>bold</b>'


import extractFromHTML from 'html-extract-data';

const data = extractFromHTML(
  // a HTML DOM element
    // query element within the html
    query: '.grid-item',
    // if list, it will use querySelectorAll and return an array
    list: true,
    // extract dat (mostly attributes) from the element itself
    self: {
      // grab the `data-category` attribute and put it in the `category` field
      'category': 'data-category',
      // convert the value to a number
      'id': { attr: 'data-id', convert: 'number' },
    // extract extra data from child elements
    data: {
      // get the text value from the `h2` element
      title: 'h2',
      // get the html value from the `p` element
      description: { query: 'p', html: true },
      // get the text value from the `.tag` elements, and return as an array
      tags: { query: '.tags > .tag', list: true },

      // option to convert your extracted value, provide a user function      
      price: { query: '.price', convert: parseFloat }
      // or use any of the built-in converts (number, float, boolean, date)
      date: { query: '.date', convert: 'date' }
      // when passed a function, you can do your own logic,
      // extract and process any information you want, and return a value
      // the extract function passed is bould to the parent element
      // the parent element itself is also passed
      image: (extract, element) => ({
        // in here you can call and pass the same information as above
        alt: extract({ query: '.js-image', attr: 'alt' }),
        // or use the shorthand syntax
        src: extract('.js-image', { attr: 'src' }),
      // alternative option for the above
      image2: (extract) =>
        // if we just want to exract info from a single element
        // we can just pass a data object with shorthand extracts (see below)
        extract('.js-image', {
          data: { src: 'src', alt: 'alt' }
      // use the shorthand syntax to extra information from a single element
      link: {
        // specify the query to that element
        query: 'a',
        data: {
          // when passed a string, it will extract the attribute
          href: 'href',
          // when passed as object, it will do the same as normal
          target: { attr: 'target', convert: 'number' },
          // when passed true, it will grab the text content
          text: true,
          // this will extract the HTML content
          value: { html: true },
  // pass an additional object that will be merged in each extracted item
    // normal property
    visible: false,
    // allows deep merging (this prepends a default value to the array)
    tags: ['select a value']

Will output:
  category: 'js',
  id: 1,
  title: 'title',
  description: 'description <b>bold</b>',
  tags: ['select a value', 'a', 'b', 'c'],
  price: 123.45,
  date: Date(2018-20-08 ... )
  image: {
    src: 'foo.jpg',
    alt: 'foobar',
  image2: {
    src: "foo.jpg",
    alt: "foobar",
  link: {
    href: '',
    target: '_blank',
    text: 'google',
    value: '<b>google</b>'
  visible: false


This library uses Joi to validate the input config structure, but it's quite large. That's why they are added within process.env.NODE_ENV !== 'production' checks, which means that your build process can strip it out.


View the unit tests to see all the possible ways this module can be used.


In order to build html-extract-data, ensure that you have Git and Node.js installed.
Clone a copy of the repo:
git clone

Change to the html-extract-data directory:
cd html-extract-data

Install dev dependencies:

Use one of the following main scripts:
yarn build            # build this project
yarn test             # run the unit tests incl coverage
yarn test:dev         # run the unit tests in watch mode
yarn lint             # run tslint on this project








MIT © Tha Narie