Crawl & Access-Control Test Site
================================

Purpose:
- Verify whether a crawler (e.g., web defacement scanner) can access subdirectories from top pages.
- Test behavior for directories with access restrictions (admin/private), uploads, and files that are linked vs not linked.

Structure:
- /public/index.html         : Top page linking to several resources.
- /public/page1.html         : Links to subdir page2.
- /public/hidden_direct.html : Linked only from /public/index.html.
- /public/subdir_public2/page2.html : Deeper page.
- /uploads/file.txt          : Uploads sample (readable).
- /uploads/script.php        : PHP script placed in uploads to test if execution is blocked.
- /admin/                    : Admin area (should be restricted).
- /private/secret.txt        : Private file (should be denied).
- /robots.txt                : Disallows /admin and /private for well-behaved bots.
- /sitemap.xml               : Lists public pages (bots may use it).

Test Plan (recommended):
1) Deploy the site with DocumentRoot pointing to the parent of 'public' so that URLs map as /public/...
   Alternatively map DocumentRoot to site root and access paths as given.

2) Baseline:
   - curl http://testsite.local/public/index.html  -> 200
   - curl http://testsite.local/public/page1.html  -> 200
   - curl http://testsite.local/public/subdir_public2/page2.html -> 200
   - curl http://testsite.local/public/hidden_direct.html -> 200
   - curl http://testsite.local/uploads/file.txt -> 200
   - curl http://testsite.local/uploads/script.php -> 200 (file present) but execution depends on server config
   - curl http://testsite.local/admin/ -> expected 403 or 401
   - curl http://testsite.local/private/secret.txt -> expected 403

3) Crawler behavior tests:
   - Run the crawler/scanner starting at /public/index.html and observe what paths it attempts to fetch.
   - Check whether it discovers:
     * linked pages (should)
     * deeper pages via linked traversal (should)
     * /uploads/script.php via link (if linked) or by brute-force (may)
     * /private/secret.txt and /admin/ (should be attempted, but server should block)

4) Robots & Sitemap:
   - Some scanners respect robots.txt and sitemap.xml; others ignore.
   - robots.txt disallows /admin and /private; confirming whether the scanner respects it is part of the test.

Apache enforcement examples:
- To deny access to /private and restrict /admin (Basic auth example), use vhost <Directory> rules.
See included example in this README.

Expected outcomes:
- Properly configured server should return 403 for /private/* and /admin/*
- Public links should be crawlable and discoverable
- Uploads directory should serve static files but not execute PHP if php_admin_flag engine Off is set under uploads

Notes:
- For accurate crawler testing, ensure no other redirects or rewrites interfere.
- Map test domain in /etc/hosts: 127.0.0.1 testsite.local