Category
Fields for Content Inventories, Audits, & other Analysis
Field Type: Category
Categorization is essential for content analysis. Some of the uses include:
Seeing the pervasiveness of something, such as average reading level by site section. If we have two categories, then we can slice two ways (such as reading levels by content type and site section).
Categories are often useful for making decisions about content (such as deciding to delete all blog posts on a site that are over some age)
Sometimes categories are one of the main things we are anayzing, for instance to understand on a currently-unstructured site the pervasiveness of certain metadata values by content type
If we take the traditional view of a content inventory or audit, we have rows
representing each page (so each row has a unique URL) and then we have columns
for things like the meta description or crawl depth. These columns are the different
fields we have available to us in our content analysis.
Since you are analyzing a focused site, unless you are
trying to set up an ongoing content analysis dashboard,
you can probably get away with using a spreadsheet.
That said, any extra fields you add here will magnify your
manual work. So you still should not take a kitchen sink approach
to your analysis, but still be deliberate with the fields you add.
Since you are analyzing a complex site, you need to be very
aware of how difficult it will be to get the values (or
figure out a way to sample and create rules to leverage manual
effort).
Some fields
have more value than others in content analysis (the content analysis value is the
y axis) and some are more amenable to automation (the x axis).
③. Select fields toward your goal, grounded in your prioritized list of questions you want answered.
Although you can use this database however you like,
in general we recommend that you build up a list of
fields that will be useful for your analysis. To do so,
just click on the heart next to any field name.
After you have hearted some fields, you
can see an analysis of your list at My List ♥
(at which point you can move to ④. Start iterating on your analysis, starting with the basics).
Content Type (semantic type of content, such as Product Page or Event) is usually an extremely effective way to group and look for patterns across a digital presence.
Where is the primary source of content for this URL? For instance, what CMS, document management system, or product information system does this content primarily come from?
General usefulness is a blend of the difficulty in getting the value
and how useful it is once you have it. These stars roughly correspond to:
★★★ Broadly Useful. These would be worth including a most analyses.
★★ Frequently useful for particular needs. These may not be quite as broadly
useful, but they are frequently useful. Notably, if you have a general reason for a field
that is rated two stars then you may wish to go to the category and look for others that
may be slightly more useful.
★ Rarely useful. These are listed since they still have a "following"
or because they are easy to implement so are tempting to rely upon. Your mileage of course
may vary, but in general these are less useful fields.
Ease of automation is how easy it is to get the value:
⚙⚙⚙⚙ Easy to automate. Completely point-and-shoot automation (although, of course,
there can be exceptions to when some information is more difficult to extract than it should be).
⚙⚙⚙ Relatively easy to automate. With the right tool, this can
almost certainly be automated with very limited configuration (not requiring deep technical
knowledge).
⚙⚙ Can probably be automated. With the right tool and some
technical ability and/or time to configure (for instance, providing xpath and regex
information to select content out of a page) this can probably be automated. But it
probably is not just clicking a single button to set up and run (unlike the three and four
gears items).
⚙ Very difficult to automate. This almost certainly
requires manual intervention. Note: there is a technique that can be applied in many cases
to sample and
then use rules and repeat.
Obviously all of the above is ratings in the general case. You may have particular
needs for fields that are generally not useful, and you may already have some
clean data that makes automation trivial for some elements that are more generally
difficult.