Exclude url with AWS Q WEBCRAWLER using regex don't work

0

I am use AWS Q WEBCRAWLER to crawl information from my company website. But there is 1 url that end with .mp3 that is not able to index. Though the sync status is shown completed but it's yellow (I guess this means there is some issues). So I go edit and in the sync scope, I see there are "crawl url pattern" to exclude and "url index pattern" to exclude. I tried regex ^(https?|ftp|file)://(www.)?(.*?).(mp3)$ to match the url and I test it did match. But after I sync again, it still don't work, the url is still being crawled. I even tried using the url itself and still don't work. Am I doing anything wrong? Does anyone know situation like this?

PS: I think I solved it. So I only use URL Crawl Exclusion pattern with regex but not Index URL Exclusion pattern. I think there are many issues with Q Business, and the best way to solve them is to delete and rebuild the Q app again.

hd
asked 2 months ago412 views
1 Answer
0
AWS
James_J
answered 2 months ago
  • I mean I want to exclude any url that end with .mp3, but the url is still not excluded after I add the regex.