Project Overview
Switch to MD is a library that converts any file format to structured Markdown, enabling AI Agents to perfectly understand all document content. The core approach abstracts the conversion process into a "Sniff → Route → Adapt → Normalize" pipeline, supporting plugins for local or cloud models for OCR and semantic analysis.
Core Features
- One line of code:
convert('file.pdf') - Lightweight core: ~500KB, no model dependency
- Flexible configuration: Environment variables / config file / code config
- Agent friendly: Structured MD output with frontmatter
Installation
# Core package (document types work directly)
npm i switch-to-md
# Image processing (optional)
npm i @switch-to-md/vision-openai # OpenAI Vision
npm i @switch-to-md/vision-local # Local Tesseract
# Audio processing (optional)
npm i @switch-to-md/audio-openai # OpenAI Whisper
npm i @switch-to-md/audio-local # Local WhisperQuick Start
import { convert } from 'switch-to-md';
// Convert PDF
const md = await convert('report.pdf');
// Convert Word
const md = await convert('document.docx');
// Output to file
await convert('report.pdf', { output: 'report.md' });
// Batch conversion
const results = await convert(['a.pdf', 'b.docx']);Supported Formats
| Category | Formats | Notes |
|---|---|---|
| Documents | PDF, DOCX, XLSX, PPTX, TXT, HTML, EPUB | No config needed |
| Images | PNG, JPG, GIF, BMP, TIFF, WebP | Requires vision plugin |
| Audio | MP3, WAV, FLAC, OGG, M4A | Requires audio plugin |
| Code | JSON, YAML, XML, CSV, multi-language source | No config needed |
| Vector | SVG | No config needed |
Configuration
import { configure } from 'switch-to-md';
// OpenAI
configure({
provider: 'openai',
apiKey: process.env.OPENAI_API_KEY,
});
// Local model
configure({
vision: { provider: 'local', model: 'tesseract' },
});CLI Usage
# Convert file
npx switch-to-md convert report.pdf
# Batch convert
npx switch-to-md convert ./docs/**/*.pdf -o ./output/
# Setup wizard
npx switch-to-md setupOutput Format
---
source: report.pdf
type: PDF
pages: 12
extracted_at: 2026-04-29T00:00:00Z
backend: pdf-parse
---
# Document Title
## Body
[Content]
## Tables
| Col1 | Col2 |
|------|------|
| Data | Data |Conversion Pipeline
Sniff → Route → Adapt → Normalize
↓ ↓ ↓ ↓
Detect Select Execute Unify
type handler conversion output
Current Status
Core implementation complete, supporting PDF/Word/Excel/PPT/images/audio and other mainstream formats for Markdown conversion.
Related Links
Last updated: 2026-04-29