Category Fields for Content Inventories, Audits, & other Analysis

Field Type: Category

Categorization is essential for content analysis. Some of the uses include:

  • Seeing the pervasiveness of something, such as average reading level by site section. If we have two categories, then we can slice two ways (such as reading levels by content type and site section).
  • Categories are often useful for making decisions about content (such as deciding to delete all blog posts on a site that are over some age)
  • Sometimes categories are one of the main things we are anayzing, for instance to understand on a currently-unstructured site the pervasiveness of certain metadata values by content type
If we take the traditional view of a content inventory or audit, we have rows representing each page (so each row has a unique URL) and then we have columns for things like the meta description or crawl depth. These columns are the different fields we have available to us in our content analysis.

①. Define what you are trying to accomplish.

Your content analysis needs to be grounded on your analyze goal.

②. Define your analysis approach.

Size and complexity of your digital presence

Size and complexity of your digital presence should drive your content analysis approach.
My digital presence is:
Your approach

Content analysis does not necessarily mean opening up a spreadsheet. Before diving in, you should define your basic approach to the analysis.

③. Select fields toward your goal, grounded in your prioritized list of questions you want answered.

Content Type
Content Type (semantic type of content, such as Product Page or Event) is usually an extremely effective way to group and look for patterns across a digital presence.
General Usefulness:
Ease of Automation:
Folder1 is the first "folder" in the path, such as "blog" in "". This is often an effective proxy for site section.
General Usefulness:
Ease of Automation:
Consider instead: Site Section
[IA] Depth
The depth from the perspective of the main navigational structures, for instance the Breadcrumb Depth.
General Usefulness:
Ease of Automation:
Within which site (as experienced by the site visitor) does this content appear?
General Usefulness:
Ease of Automation:
Consider instead: Site Type
Site Section
The section of a site (for instance the news section, or a section for a particular program).
General Usefulness:
Ease of Automation:
Site Type
For large scale digital presences, grouping sites by type can be a highly effective way of managing and transforming.
General Usefulness:
Ease of Automation:
Source System
Where is the primary source of content for this URL? For instance, what CMS, document management system, or product information system does this content primarily come from?
General Usefulness:
Ease of Automation:
The topic/subject of the content.
General Usefulness:
Ease of Automation:


General usefulness is a blend of the difficulty in getting the value and how useful it is once you have it. These stars roughly correspond to:

  • ★★★ Broadly Useful. These would be worth including a most analyses.
  • ★★ Frequently useful for particular needs. These may not be quite as broadly useful, but they are frequently useful. Notably, if you have a general reason for a field that is rated two stars then you may wish to go to the category and look for others that may be slightly more useful.
  • ★ Rarely useful. These are listed since they still have a "following" or because they are easy to implement so are tempting to rely upon. Your mileage of course may vary, but in general these are less useful fields.

Ease of automation is how easy it is to get the value:

  • ⚙⚙⚙⚙ Easy to automate. Completely point-and-shoot automation (although, of course, there can be exceptions to when some information is more difficult to extract than it should be).
  • ⚙⚙⚙ Relatively easy to automate. With the right tool, this can almost certainly be automated with very limited configuration (not requiring deep technical knowledge).
  • ⚙⚙ Can probably be automated. With the right tool and some technical ability and/or time to configure (for instance, providing xpath and regex information to select content out of a page) this can probably be automated. But it probably is not just clicking a single button to set up and run (unlike the three and four gears items).
  • ⚙ Very difficult to automate. This almost certainly requires manual intervention. Note: there is a technique that can be applied in many cases to sample and then use rules and repeat.

Obviously all of the above is ratings in the general case. You may have particular needs for fields that are generally not useful, and you may already have some clean data that makes automation trivial for some elements that are more generally difficult.