Page Content to Markdown의 버전 기록 - 2개 버전
Page Content to Markdown 제작자: Jared
Page Content to Markdown의 버전 기록 - 2개 버전
이전 버전의 사용에 주의하십시오! 이 버전들은 테스트 및 참조 용도로만 표시되는 것입니다.항상 최신 버전의 부가 기능을 사용해야 합니다.
최신 버전
버전 1.0.1
2026년 5월 12일에 출시 - 119.44 KBfirefox 109.0 이상에서 작동Fixed- General extractor picks the largest matching candidate per selector, not the first. On The Verge, the first <article> on a story page is a related-cards stub — first-match-wins picked it and returned empty markdown. Score every match by textContent.length and pick the largest qualifying candidate.
- Tighter content-significance threshold. Bump the hasSignificantContent floor to ≥3 <p> descendants and ≥500 chars of trimmed text. Rejects related-card grids that previously slipped through because their aggregated link text passed the old 50-char gate.
- SVG elements no longer crash Turndown mid-traversal. SVG className is a SVGAnimatedString, not a string; calling .toLowerCase() on it threw and Turndown returned '' for the whole page. Read class via getAttribute('class') throughout the converter, with a fallback to .baseVal for safety. Eliminates a silent empty-output failure mode on news sites that ship inline SVG icons.
- Visible junk inside the article body no longer ships through. Expanded the non-content substring regex with author-bio, author-card, byline-bio, topics-list, tags-list, tags-row, subscribe, affiliate, disclosure, disclaimer, share-row, share-icons, social-icons, related-articles, related-stories, read-more-cta, keep-reading, frequently-asked, faq-, further-reading, comments-section. Clears author-bio cards on TechCrunch / Tom's Guide, the trailing FAQ section on Mashable, and the end-of-post subscribe widget on Substack.
- Structural section rejector for related/topics/FAQ/subscribe blocks. Any <section> or <div> whose first heading (looking one level deep through a wrapper div) reads as Topics, Tags, Related…, Frequently Asked…, Further Reading, Read Next, Keep Reading, Recommended, or Subscribe to… gets rejected wholesale, regardless of class names. Catches framework-generated wrappers (mx-auto mt-12, pc-paddingTop-32) that didn't pattern-match before.
MIT 라이선스에 따라 릴리스된 소스 코드