Data Collection Portal provides under its Tools section a crawling tool called Tag Crawler, built by our teams to check hits generated on the loading of all URLs you need to verify.
It can help you identify tagging issues, uncovered perimeters or potential improvements to do on specific sections of your sites.
Crawls are listed per project, showing the results of the last crawls in a short summary from Tag Crawler's homepage.
If you go in a project, you will see all crawls listed that have been done using the same configuration.
Also each crawl is given a Crawl ID for you to be sure to refer to the right session.
To create a crawl project, you need to reach out to our support services, or your Key Account Manager/Customer Success Manager.
They will set up your crawl project on our internal interfaces, specifying the required configuration.
NoteTo help our teams create your crawl, please be sure to check the main rules of Tag Crawler, and its requirements.
- URLs crawled are not following any given order
- Crawls can be planned in advance and done anytime
- Crawls can be restricted to a specific folder in your URL path
- iframes will also be crawled and counted in your results
- Tag Crawler only checks hits loaded on the page view (it does not click or fill in information)
- Logged in platforms can be crawled but require an AT study on the authentication
- Crawls can only be set on websites since it relies on URLs
- To let Tag Crawler access your site, please check with your IT teams if a whitelisting is required
- What is the start URL (first URL to crawl)?
- What is the site number?
- Should the project crawl all subdomains, or a specific one?
- Does the site have a CDDC setup?
- Does the site have a specific pixel path?
- Should the crawl ignore QueryStrings?
- Should the crawl ignore redirections?
- Should the crawl simulate a mobile device?
- Should the crawl follow a sitemap? If so should it be followed strictly?
- What's the maximum depth of the crawl (path folders)?
- What's the maximum number of URLs to crawl?
Once the crawl has been set and done by our teams, you will be notified that the results are available in the interface.
Here is what you will find:
- Number of URLs crawled
- Number of tagged URLS
- Tagging ratio
- Results per URL including
- Order (i.e. 3.2 means the URL has been crawled in third, and it was its second crawl attempt)
- URL crawled
- Type of hit detected (other hits can be triggered automatically on a page view)
- Label (event name detected)
- Site (site number)
- Tagged URLs (returns if the page is tagged or not)
The Tag Crawler result interface allows you to interact with your crawl results.
You can filter the results of most columns in the results table by going on the 3 dots icon.
You can sort the results of most columns in the results table by clicking on the column title or going on the 3 dots icon.
You can isolate specific URLs or event types, or labels by typing specific words in the search bar right above the results table.
Show only untagged pages
You can isolate untagged pages by ticking on the dedicated checkbox on the top left corner of the table.
You can click on the dedicated button above the table to generate a csv file with the results of your crawl.
Check crawling rules
By clicking on the crawling rules button, you can see all the setup that was set for the project.