findAndValidateFinalLinksForQueue

findAndValidateFinalLinksForQueue ( int|default:1 _nbTasks , regex _regexExclusion ) : map

There may be redirects when a URL is crawled, then there are 2 URLs, the starting one and the final URL after all the redirects. This function traverses the stack of pending URLs, finds the final URLs, replaces them, and excludes them if they match the link exclusion regular expression. It is made to be economical and not to go over the same link, even if it is called several times.

Parameters

_nbTasks (optional)

default:1. Number of parallel tasks which test URLs. (this function can be long. It can double the crawling time in non parallelized context.)

_regexExclusion (optional)

Regex of links to exclude to this analysis