I have a large file with URL strings such as:
http://tg24.sky.it/mondo/2020/05/01/corea-nord-kim-riappare.html
http://tg24.sky.it/mondo/01/05/2020/corea-nord-kim-riappare.html
http://tg24.sky.it/mondo/2020/04/30/corea-nord-kim-riappare.html
http://tg24.sky.it/mondo/04/30/2020/corea-nord-kim-riappare.html
I need to extract only the URLs with date 01-05-2020 in any format it arrives, with or without separators.
so I have written the following regexp:
^./?0?(1|5|(?:20)?20)[/-]0?(1|5|(?:20)?20)[/-]0?(1|5|(?:20)?20)/?.$
it works fine, but also finds false positives such as:
XXXX/5/5/5/YYYYY
So I understand that I need to enhance it in a way - that if the first pattern is MM, then look in the second for DD or YYYY, and then in the third only look for what is left.
An thoughts of how to do it ?
Thanks,
Dani