Link Checking CI Integration
This project includes automated link checking for new and modified content during CI builds.
How It Works
The link checking system automatically:
- Detects changed files: Uses
git diff
to find new or modified markdown files - Extracts links: Finds both regular links
[text](url)
and images
- Validates links:
- External links: Makes HTTP HEAD requests to verify they're reachable
- Internal links: Checks if the referenced files exist on disk
- Provides summary: Outputs detailed statistics and lists any broken links
Features
- 📄 File-by-file analysis of changed content only
- 🌐 External link validation with timeout and error handling
- 📁 Internal link verification against local filesystem
- 🖼️ Image link checking for both external and internal images
- 📊 Comprehensive statistics and summary reporting
- ❌ Broken link details with specific error information
- ⚡ Duplicate URL deduplication to avoid redundant checks
Usage
In CI (Automatic)
The link checking runs automatically on every build via GitHub Actions:
- name: Run tests
run: npm test
continue-on-error: true
- name: Run link check summary
run: node test/link-check-summary.js
continue-on-error: true
Manual Testing
# Run full test suite (includes link checking)
npm test
# Run only link checking with summary
npm run link-check
Sample Output
📝 Link Check Summary
==================================================
📄 Files checked: 2
🔍 Checking: content/zh/blog/new-post.md
📊 Links: 3 external, 2 internal, 1 images
🔍 Checking: content/en/docs/guide.md
📊 Links: 1 external, 0 internal, 0 images
📊 Summary Statistics
------------------------------
🔗 Total links found: 6
🖼️ Total images found: 1
🌐 External links: 4
📁 Internal links: 2
❌ Broken external links: 1
❌ Broken internal links: 0
🚫 Broken External Links
------------------------------
🔗 content/zh/blog/new-post.md: https://example-broken-link.com
Status: 404
✅ All internal links are valid!
❌ Found 1 broken link(s) in total.
==================================================
Configuration
Timeout Settings
External link checks have a 10-second timeout by default. You can modify this in test/link-check-summary.js
:
const req = lib.request(url, { method: 'HEAD', timeout: 10000 }, (res) => {
Ignoring Links
- Root-relative links (starting with
/
) are automatically skipped as they're handled by Hugo - To ignore specific domains: Consider adding them to the existing
scripts/ignore-urls.txt
for htmlproofer-based checking
Link Types Checked
- ✅ External HTTP/HTTPS links
- ✅ Internal relative links
- ✅ Image references (both external and internal)
- ❌ Root-relative links (skipped - handled by Hugo)
- ❌ Hash/anchor links (not currently validated)
Error Handling
The system handles various error scenarios:
- Network timeouts: 10-second timeout for external requests
- DNS resolution failures: Reports EAI_AGAIN errors
- HTTP errors: Reports status codes (404, 500, etc.)
- Missing files: Checks local filesystem for internal links
- Malformed URLs: Gracefully handles parsing errors
Contributing
When adding new link checking features:
- Update
test/link-check-summary.js
for core functionality - Ensure backward compatibility with existing
test/content.test.js
- Test with both valid and broken links
- Update this documentation
Related Files
test/link-check-summary.js
- Core link checking and summary logictest/content.test.js
- Integration with existing test suite.github/workflows/deploy.yml
- CI integrationscripts/broken-link-checker.sh
- Alternative full-site checking (htmlproofer)