Content Analysis DB > Fields > Field Types > > File Format My List ♥ ()

File Format

File format (as opposed to content type) is the actual format of the file as delivered by the web server (PDF, HTML, etc). This is especially useful for sites with a large amount of non-HTML.
General Usefulness: ★ ★ ★
Ease of Automation: ⚙⚙⚙⚙
⚠ Gotchas ⚠

Note that this is different than file extension. For example, .jpeg and .jpg are valid extensions for a JPEG image.

Source Types:
CMS, Crawler, Formula
Example values:
PDF, Word, Excel, CSV, JPEG, SPSS, Powerpoint, Zip

Other closely related fields you may consider

Note that lower level fields may sometimes be needed to compute more useful fields. Also, sometimes the higher level fields may be more difficult to compute, so they are not always worth it.

↑↑↑ Consider this higher level field ↑↑↑
File Format (you are here)
↓↓↓ Often lower level fields are less useful ↓↓↓
Sometimes confused with: Content Type
In Content Chimera

Chimera determines file format by looking at the mime return code from the server as well as the file extension. This is in the file_format field.