Exclude url with AWS Q WEBCRAWLER using regex don't work

I am use AWS Q WEBCRAWLER to crawl information from my company website. But there is 1 url that end with .mp3 that is not able to index. Though the sync status is shown completed but it's yellow (I guess this means there is some issues). So I go edit and in the sync scope, I see there are "crawl url pattern" to exclude and "url index pattern" to exclude. I tried regex ^(https?|ftp|file)://(www.)?(.*?).(mp3)$ to match the url and I test it did match. But after I sync again, it still don't work, the url is still being crawled. I even tried using the url itself and still don't work. Am I doing anything wrong? Does anyone know situation like this?

PS: I think I solved it. So I only use URL Crawl Exclusion pattern with regex but not Index URL Exclusion pattern. I think there are many issues with Q Business, and the best way to solve them is to delete and rebuild the Q app again.

Topics

Machine Learning & AI Business Applications

Relevant content

Amazon Q Webcrawler Error: Exception in Starting Crawler Threads with Sitemap Datasource
jftsg
asked 8 months ago
Limit Amazon Q web crawler to single domain
Shayne
asked 5 months ago
Amazon Q Not Uniformly Crawling Multiple Sources Specified in "Source URLs
shilparao
asked 4 months ago
VPCE endpoint issues using webcrawler blueprint Synthetic Canary Lambda
Accepted Answer
aramsdell
asked 2 years ago
Why am I not able to connect to my AWS Glue development endpoint using SSH?
AWS OFFICIALUpdated 3 years ago
I'm using CloudFront with an Amazon S3 origin. Why am I not able to access my files?
AWS OFFICIALUpdated 21 days ago
Why is a trailing slash added to the URL when I use the Amazon S3 static website redirection feature?
AWS OFFICIALUpdated 5 months ago
What do I do if AWS resources are being used to crawl my website?
AWS OFFICIALUpdated 4 years ago
Tips for using effective prompting with Amazon Q Apps
EXPERT
rePost-Tiffany
published a month ago
AWS re:Post Live | Answering Your Amazon Q Questions from re:Post!
EXPERT
AWS rePost Live
published 3 months ago