If we take the traditional view of a content inventory or audit, we have rows
representing each page (so each row has a unique URL) and then we have columns
for things like the meta description or crawl depth. These columns are the different
fields we have available to us in our content analysis.
Since you are analyzing a focused site, unless you are
trying to set up an ongoing content analysis dashboard,
you can probably get away with using a spreadsheet.
That said, any extra fields you add here will magnify your
manual work. So you still should not take a kitchen sink approach
to your analysis, but still be deliberate with the fields you add.
Since you are analyzing a complex site, you need to be very
aware of how difficult it will be to get the values (or
figure out a way to sample and create rules to leverage manual
effort).
Some fields
have more value than others in content analysis (the content analysis value is the
y axis) and some are more amenable to automation (the x axis).
③. Select fields toward your goal, grounded in your prioritized list of questions you want answered.
Although you can use this database however you like,
in general we recommend that you build up a list of
fields that will be useful for your analysis. To do so,
just click on the heart next to any field name.
After you have hearted some fields, you
can see an analysis of your list at My List ♥
(at which point you can move to ④. Start iterating on your analysis, starting with the basics).
When planning a transformation, it can be useful to bucket similar content (often grouping content that will be treated similarly to explain the situation to stakeholders).
Particularly useful for product or property pages, this field represents the revenue generated by this product or property (regardless of whether it was directly generated by digital channels or not).
Content Type (semantic type of content, such as Product Page or Event) is usually an extremely effective way to group and look for patterns across a digital presence.
File format (as opposed to content type) is the actual format of the file as delivered by the web server (PDF, HTML, etc). This is especially useful for sites with a large amount of non-HTML.
On particularly complex digital presences, there may be so many file formats that seeing all of them in charts or presentations is confusing. File Group groups the file formats, for instance "Data or Spreadsheet" to capture CSV and Excel files.
The count of pages in a PDF can help us understand whether there are primarily short PDFs (perhaps most easily converted to HTML) or very long PDFs (perhaps for specialist audiences).
Page views are often the first thing to be added from an additional source, after the basic rows of content in the inventory/audit have been determined. Although not always the most useful metric for the value of content, it's often the most immediately tangible proxy.
An example of a problem (on a specific page) you are investigating. This field could be repeated in an analysis, with actual fields like "Table Example" or "Bad Character Encoding Example".
Reading Level represents the education level required to understand text. Since much content is overly complex, this can be useful to identify where there may be education requirement mismatches.
Where is the primary source of content for this URL? For instance, what CMS, document management system, or product information system does this content primarily come from?
What the *desired* field value would be. For instance, there may be an existing content type but in some cases it should be another content type (so you would have a Content Type field as well as a Target Content Type field).
The title of the content is the most useful to people when looking at individual "rows" of an inventory. That said, unlike URL, these are not guaranteed to be unique.
General usefulness is a blend of the difficulty in getting the value
and how useful it is once you have it. These stars roughly correspond to:
★★★ Broadly Useful. These would be worth including a most analyses.
★★ Frequently useful for particular needs. These may not be quite as broadly
useful, but they are frequently useful. Notably, if you have a general reason for a field
that is rated two stars then you may wish to go to the category and look for others that
may be slightly more useful.
★ Rarely useful. These are listed since they still have a "following"
or because they are easy to implement so are tempting to rely upon. Your mileage of course
may vary, but in general these are less useful fields.
Ease of automation is how easy it is to get the value:
⚙⚙⚙⚙ Easy to automate. Completely point-and-shoot automation (although, of course,
there can be exceptions to when some information is more difficult to extract than it should be).
⚙⚙⚙ Relatively easy to automate. With the right tool, this can
almost certainly be automated with very limited configuration (not requiring deep technical
knowledge).
⚙⚙ Can probably be automated. With the right tool and some
technical ability and/or time to configure (for instance, providing xpath and regex
information to select content out of a page) this can probably be automated. But it
probably is not just clicking a single button to set up and run (unlike the three and four
gears items).
⚙ Very difficult to automate. This almost certainly
requires manual intervention. Note: there is a technique that can be applied in many cases
to sample and
then use rules and repeat.
Obviously all of the above is ratings in the general case. You may have particular
needs for fields that are generally not useful, and you may already have some
clean data that makes automation trivial for some elements that are more generally
difficult.