Skip to content

Link Checking CI Integration

This project includes automated link checking for new and modified content during CI builds.

How It Works

The link checking system automatically:

  1. Detects changed files: Uses git diff to find new or modified markdown files
  2. Extracts links: Finds both regular links [text](url) and images ![alt](url)
  3. Validates links:
  4. External links: Makes HTTP HEAD requests to verify they're reachable
  5. Internal links: Checks if the referenced files exist on disk
  6. Provides summary: Outputs detailed statistics and lists any broken links

Features

  • 📄 File-by-file analysis of changed content only
  • 🌐 External link validation with timeout and error handling
  • 📁 Internal link verification against local filesystem
  • 🖼️ Image link checking for both external and internal images
  • 📊 Comprehensive statistics and summary reporting
  • Broken link details with specific error information
  • Duplicate URL deduplication to avoid redundant checks

Usage

In CI (Automatic)

The link checking runs automatically on every build via GitHub Actions:

- name: Run tests
  run: npm test
  continue-on-error: true

- name: Run link check summary
  run: node test/link-check-summary.js
  continue-on-error: true

Manual Testing

# Run full test suite (includes link checking)
npm test

# Run only link checking with summary
npm run link-check

Sample Output

📝 Link Check Summary
==================================================
📄 Files checked: 2

🔍 Checking: content/zh/blog/new-post.md
   📊 Links: 3 external, 2 internal, 1 images

🔍 Checking: content/en/docs/guide.md
   📊 Links: 1 external, 0 internal, 0 images

📊 Summary Statistics
------------------------------
🔗 Total links found: 6
🖼️  Total images found: 1
🌐 External links: 4
📁 Internal links: 2
❌ Broken external links: 1
❌ Broken internal links: 0

🚫 Broken External Links
------------------------------
🔗 content/zh/blog/new-post.md: https://example-broken-link.com
   Status: 404

✅ All internal links are valid!

❌ Found 1 broken link(s) in total.
==================================================

Configuration

Timeout Settings

External link checks have a 10-second timeout by default. You can modify this in test/link-check-summary.js:

const req = lib.request(url, { method: 'HEAD', timeout: 10000 }, (res) => {
  • Root-relative links (starting with /) are automatically skipped as they're handled by Hugo
  • To ignore specific domains: Consider adding them to the existing scripts/ignore-urls.txt for htmlproofer-based checking
  • ✅ External HTTP/HTTPS links
  • ✅ Internal relative links
  • ✅ Image references (both external and internal)
  • ❌ Root-relative links (skipped - handled by Hugo)
  • ❌ Hash/anchor links (not currently validated)

Error Handling

The system handles various error scenarios:

  • Network timeouts: 10-second timeout for external requests
  • DNS resolution failures: Reports EAI_AGAIN errors
  • HTTP errors: Reports status codes (404, 500, etc.)
  • Missing files: Checks local filesystem for internal links
  • Malformed URLs: Gracefully handles parsing errors

Contributing

When adding new link checking features:

  1. Update test/link-check-summary.js for core functionality
  2. Ensure backward compatibility with existing test/content.test.js
  3. Test with both valid and broken links
  4. Update this documentation
  • test/link-check-summary.js - Core link checking and summary logic
  • test/content.test.js - Integration with existing test suite
  • .github/workflows/deploy.yml - CI integration
  • scripts/broken-link-checker.sh - Alternative full-site checking (htmlproofer)