Content Analysis DB > Fields > My List ♥ (0)

Fields for Content Inventories, Audits, & other Analysis

If we take the traditional view of a content inventory or audit, we have rows representing each page (so each row has a unique URL) and then we have columns for things like the meta description or crawl depth. These columns are the different fields we have available to us in our content analysis.

①. Define what you are trying to accomplish.

Your content analysis needs to be grounded on your analyze goal.

②. Define your analysis approach.

Size and complexity of your digital presence

Size and complexity of your digital presence should drive your content analysis approach.
My digital presence is:
Use the calculator

Your approach

Content analysis does not necessarily mean opening up a spreadsheet. Before diving in, you should define your basic approach to the analysis.

③. Select fields toward your goal, grounded in your prioritized list of questions you want answered.

Although you can use this database however you like, in general we recommend that you build up a list of fields that will be useful for your analysis. To do so, just click on the heart next to any field name. After you have hearted some fields, you can see an analysis of your list at My List ♥ (at which point you can move to ④. Start iterating on your analysis, starting with the basics).

Audience
Audience the content is actually written to target (whether or not it is supposed to).
General Usefulness:
Ease of Automation:
Compare with other User fields.
Author
The person(s) who wrote the content. This may be different than who published or crafted the page.
General Usefulness:
Ease of Automation:
Compare with other Org fields.
Bucket
When planning a transformation, it can be useful to bucket similar content (often grouping content that will be treated similarly to explain the situation to stakeholders).
General Usefulness:
Ease of Automation:
Consider instead: Disposition
[Category] Revenue
Particularly useful for product or property pages, this field represents the revenue generated by this product or property (regardless of whether it was directly generated by digital channels or not).
General Usefulness:
Ease of Automation:
Compare with other Org fields.
Content Type
Content Type (semantic type of content, such as Product Page or Event) is usually an extremely effective way to group and look for patterns across a digital presence.
General Usefulness:
Ease of Automation:
Compare with other Category fields.
Crawl Depth
How many links the crawler needed to follow to get to this item.
General Usefulness:
Ease of Automation:
Consider instead: [IA] Depth
Date Published
Date the content was originally published. This is frequently a useful factor in deciding what content can be culled.
General Usefulness:
Ease of Automation:
Compare with other Quality fields.
Disposition
This is the treatment a piece of content will get during a transformation.
General Usefulness:
Ease of Automation:
Consider instead: Effort
Division
The organizational division (or department, vice presidency, company, etc) that owns the page.
General Usefulness:
Ease of Automation:
Compare with other Org fields.
Effort
Expected manual effort to transform the content item.
General Usefulness:
Ease of Automation:
Compare with other Decision fields.
File Format
File format (as opposed to content type) is the actual format of the file as delivered by the web server (PDF, HTML, etc). This is especially useful for sites with a large amount of non-HTML.
General Usefulness:
Ease of Automation:
Consider instead: File Group
File Group
On particularly complex digital presences, there may be so many file formats that seeing all of them in charts or presentations is confusing. File Group groups the file formats, for instance "Data or Spreadsheet" to capture CSV and Excel files.
General Usefulness:
Ease of Automation:
Compare with other Basic fields.
Folder1
Folder1 is the first "folder" in the path, such as "blog" in "test.com/blog/". This is often an effective proxy for site section.
General Usefulness:
Ease of Automation:
Consider instead: Site Section
Has [Problem]
Yes or no, does this piece of content have this specific problem? The actual field name would depend on your situation, such as "Has Wall of Text".
General Usefulness:
Ease of Automation:
Compare with other Quality fields.
[IA] Depth
The depth from the perspective of the main navigational structures, for instance the Breadcrumb Depth.
General Usefulness:
Ease of Automation:
Compare with other Category fields.
MIME Type
The technical content type reported by the web server, which the web browser uses to determine how to display it.
General Usefulness:
Ease of Automation:
Consider instead: File Format
Meta Description
The meta description. This is of limited use, aside from simply discovering what pages do not have a meta description (and therefore require one).
General Usefulness:
Ease of Automation:
Compare with other Technical fields.
Meta Keywords
Meta keywords. In most cases, very limited value. More precise meta tags (for instance topics) are usually far more useful if they exist.
General Usefulness:
Ease of Automation:
Consider instead: Topic
Near Text Duplicate
Is there a near text duplicate of the page? If so, what is the URL for that near duplicate.
General Usefulness:
Ease of Automation:
Compare with other Quality fields.
PDF Page Count
The count of pages in a PDF can help us understand whether there are primarily short PDFs (perhaps most easily converted to HTML) or very long PDFs (perhaps for specialist audiences).
General Usefulness:
Ease of Automation:
Compare with other Technical fields.
Page Views
Page views are often the first thing to be added from an additional source, after the basic rows of content in the inventory/audit have been determined. Although not always the most useful metric for the value of content, it's often the most immediately tangible proxy.
General Usefulness:
Ease of Automation:
Consider instead: [Success Event] Count
[Problem] Count
How often does the problem happen on the page? This would be a specific issue, so something like "Left Nav Count".
General Usefulness:
Ease of Automation:
Compare with other Quality fields.
[Problem] Example
An example of a problem (on a specific page) you are investigating. This field could be repeated in an analysis, with actual fields like "Table Example" or "Bad Character Encoding Example".
General Usefulness:
Ease of Automation:
Compare with other Quality fields.
Reading Level
Reading Level represents the education level required to understand text. Since much content is overly complex, this can be useful to identify where there may be education requirement mismatches.
General Usefulness:
Ease of Automation:
Compare with other User fields.
Redundant
Is the information redundant with respect to other content on the site? This is one of the anchors of the highly popular ROT three fields.
General Usefulness:
Ease of Automation:
Consider instead: Near Text Duplicate
Resourcing
Who will do the actual transformation. For a large site this would be the team, and for a smaller site it may be the individual.
General Usefulness:
Ease of Automation:
Compare with other Decision fields.
Site
Within which site (as experienced by the site visitor) does this content appear?
General Usefulness:
Ease of Automation:
Consider instead: Site Type
Site Section
The section of a site (for instance the news section, or a section for a particular program).
General Usefulness:
Ease of Automation:
Compare with other Category fields.
Site Type
For large scale digital presences, grouping sites by type can be a highly effective way of managing and transforming.
General Usefulness:
Ease of Automation:
Compare with other Category fields.
Source System
Where is the primary source of content for this URL? For instance, what CMS, document management system, or product information system does this content primarily come from?
General Usefulness:
Ease of Automation:
Compare with other Category fields.
[Success Event] Count
The count of events that were successful from this page, such as the count of purchases or the count of downloads.
General Usefulness:
Ease of Automation:
Compare with other User fields.
[Target] Field
What the *desired* field value would be. For instance, there may be an existing content type but in some cases it should be another content type (so you would have a Content Type field as well as a Target Content Type field).
General Usefulness:
Ease of Automation:
Compare with other Decision fields.
Title
The title of the content is the most useful to people when looking at individual "rows" of an inventory. That said, unlike URL, these are not guaranteed to be unique.
General Usefulness:
Ease of Automation:
Compare with other Basic fields.
Tone
How the content is communicated with language. Tone may reasonably vary across a digital presence.
General Usefulness:
Ease of Automation:
Compare with other Brand fields.
Topic
The topic/subject of the content.
General Usefulness:
Ease of Automation:
Compare with other Category fields.
URL
This is a basic, foundational requirement of an inventory where each "row" is a URL.
General Usefulness:
Ease of Automation:
Compare with other Basic fields.
Unique Content ID
An ID unique for the piece of content. This should be unique across the entire list.
General Usefulness:
Ease of Automation:
Compare with other Basic fields.
Voice
Brand voice
General Usefulness:
Ease of Automation:
Compare with other Brand fields.

Legend

General usefulness is a blend of the difficulty in getting the value and how useful it is once you have it. These stars roughly correspond to:

  • ★★★ Broadly Useful. These would be worth including a most analyses.
  • ★★ Frequently useful for particular needs. These may not be quite as broadly useful, but they are frequently useful. Notably, if you have a general reason for a field that is rated two stars then you may wish to go to the category and look for others that may be slightly more useful.
  • ★ Rarely useful. These are listed since they still have a "following" or because they are easy to implement so are tempting to rely upon. Your mileage of course may vary, but in general these are less useful fields.

Ease of automation is how easy it is to get the value:

  • ⚙⚙⚙⚙ Easy to automate. Completely point-and-shoot automation (although, of course, there can be exceptions to when some information is more difficult to extract than it should be).
  • ⚙⚙⚙ Relatively easy to automate. With the right tool, this can almost certainly be automated with very limited configuration (not requiring deep technical knowledge).
  • ⚙⚙ Can probably be automated. With the right tool and some technical ability and/or time to configure (for instance, providing xpath and regex information to select content out of a page) this can probably be automated. But it probably is not just clicking a single button to set up and run (unlike the three and four gears items).
  • ⚙ Very difficult to automate. This almost certainly requires manual intervention. Note: there is a technique that can be applied in many cases to sample and then use rules and repeat.

Obviously all of the above is ratings in the general case. You may have particular needs for fields that are generally not useful, and you may already have some clean data that makes automation trivial for some elements that are more generally difficult.