You can often scrape information out of pages, assuming some level of consistency on pages. The scraping uses some patterns like XPath and/or regex. Please note that you can use an LLM to help you define patterns (but then not run the LLM against every piece of content).
Note: a crawler follows links to get the URLs and basic information about all the pages of a site or site section. A scraper pulls out arbitrary information out of pages.
Chimera has extensive scraping capabilities, including defining and testing patterns that include an XPath and regex.To aid analysis, Chimera automatically pulls out six fields when crawling patterns: whether or not there was a match, the count of matches, the first match, second match, third match, and also a comma-separated list of matches. These multiple fields make it easier to do your content analysis.