Crawl & Access-Control Test Site ================================ Purpose: - Verify whether a crawler (e.g., web defacement scanner) can access subdirectories from top pages. - Test behavior for directories with access restrictions (admin/private), uploads, and files that are linked vs not linked. Structure: - /public/index.html : Top page linking to several resources. - /public/page1.html : Links to subdir page2. - /public/hidden_direct.html : Linked only from /public/index.html. - /public/subdir_public2/page2.html : Deeper page. - /uploads/file.txt : Uploads sample (readable). - /uploads/script.php : PHP script placed in uploads to test if execution is blocked. - /admin/ : Admin area (should be restricted). - /private/secret.txt : Private file (should be denied). - /robots.txt : Disallows /admin and /private for well-behaved bots. - /sitemap.xml : Lists public pages (bots may use it). Test Plan (recommended): 1) Deploy the site with DocumentRoot pointing to the parent of 'public' so that URLs map as /public/... Alternatively map DocumentRoot to site root and access paths as given. 2) Baseline: - curl http://testsite.local/public/index.html -> 200 - curl http://testsite.local/public/page1.html -> 200 - curl http://testsite.local/public/subdir_public2/page2.html -> 200 - curl http://testsite.local/public/hidden_direct.html -> 200 - curl http://testsite.local/uploads/file.txt -> 200 - curl http://testsite.local/uploads/script.php -> 200 (file present) but execution depends on server config - curl http://testsite.local/admin/ -> expected 403 or 401 - curl http://testsite.local/private/secret.txt -> expected 403 3) Crawler behavior tests: - Run the crawler/scanner starting at /public/index.html and observe what paths it attempts to fetch. - Check whether it discovers: * linked pages (should) * deeper pages via linked traversal (should) * /uploads/script.php via link (if linked) or by brute-force (may) * /private/secret.txt and /admin/ (should be attempted, but server should block) 4) Robots & Sitemap: - Some scanners respect robots.txt and sitemap.xml; others ignore. - robots.txt disallows /admin and /private; confirming whether the scanner respects it is part of the test. Apache enforcement examples: - To deny access to /private and restrict /admin (Basic auth example), use vhost rules. See included example in this README. Expected outcomes: - Properly configured server should return 403 for /private/* and /admin/* - Public links should be crawlable and discoverable - Uploads directory should serve static files but not execute PHP if php_admin_flag engine Off is set under uploads Notes: - For accurate crawler testing, ensure no other redirects or rewrites interfere. - Map test domain in /etc/hosts: 127.0.0.1 testsite.local