md-convert

convert HTML file to markdown

Downloads in past

Stats

StarsIssuesVersionUpdatedCreatedSize
md-convert
0.4.634 months ago7 months agoMinified + gzip package size for md-convert in KB

Readme

md-convert

Convert an HTML file to Markdown

Install

npm install [-g] md-convert

Usage

htmlmd --src <html-file> [--config <config-file>]

Where:
<html-file> - any valid HTML file
<config-file> - a JSON file to configure the conversion process (optional)
JavaScript:
import { htmlFileToMarkdownFile } from 'md-convert';
const htmlPath = 'index.html';
const configPath = 'my-mdconfig.json';
const turndownOptions = {
  headingStyle: 'atx',
  hr: '---',
  bulletListMarker: '-',
  codeBlockStyle: 'fenced',
  emDelimiter: '*'
};

const mdPath = htmlFileToMarkdownFile(htmlPath, configPath, turndownOptions);
---

API

readHtmlFile

/**
 * read HTML file and return HTMLDocument
 * @param {String} path
 * @return {HTMLDocument}
 */

const document = readHtmlFile(htmlPath);

readConfigFile

/**
 * read JSON config file and return MdcConfig object
 * @param {String} path
 * @return {MdcConfig}
 */

const config = readConfigFile(configPath);

createFrontMatter

/**
 * extract front matter from document
 * @param {HTMLDocument} document
 * @param {FrontMatterConfig} fmConfig
 * @return {FrontMatterString}
 */

const frontMatterStr = createFrontMatter(document, frontMatterConfig);

removeElements

/**
 * remove matching elements from document
 * @param {HTMLDocument} document
 * @param {CSSSelector} selector
 */

removeElements(document, omitConfig);

transformElements

/**
 * transform matching elements in document
 * @param {HTMLDocument} document
 * @param {Transformers} transformers
 */

transformElements(document, transformConfig);

elementsToMarkdown

/**
 * convert matching elements to markdown
 * @param {HTMLDocument} document
 * @param {CSSSelector} selector
 * @param {TurndownOptions} tdOptions
 * @return {MarkdownStr}
 */

const markdownText = elementsToMarkdown(
    document,
    cssSelector,
    turndownOptions
  );

htmlFileToMarkdownFile

/**
 * convert html file to markdown file
 * @param {String} htmlPath
 * @param {String} configPath
 * @param {TurndownOptions} tdOptions
 * @return {string} markdown file path
 */

const mdPath = htmlFileToMarkdownFile(htmlPath, configPath, turndownOptions);

JSON Config File Properties

{
  "description": "",
  "frontMatter": {},
  "omit": [],
  "transform": [],
  "select": []
}

Config Properties

  • description - describe what this configuration is for
  • frontMatter - how to generate "front matter" for markdown file
  • omit - CSS selector - HTML elements to remove from document
  • transform - rules to transform elements
  • select - CSS selector - HTML elements to include in conversion

Order of actions are:
  1. omit - remove elements from documemt
  2. transform - modify certain elements
  3. select - select elements for conversion

frontMatter Properties

{
  "<item_name>": {
  "selector": "<any_css_selector>",
  "attrName": "<element_attribute_name_if_any>"
}
If "attrName" is omitted, front matter content is obtained from element's textContent.

Example

HTML:
<title>Hello World Title</title>
<meta property="og:url" content="https://helloworld.com/helloworld.html">

Config:
{
  "frontMatter": {
    "title": {
      "selector": "title"
    },
    "url": {
      "selector": "meta[property='og:url']",
      "attrName": "content"
    }
  }
}

Resulting Front Matter:

```text

title: Hello World Title

url: https://helloworld.com/helloworld.html

```
(more details to follow, until then, see config directory for examples)