Digital Picture Captioning
A discussion of some of the issues associated with the captioning of digital images.

I. Lynch, Jan '99 (source: http://www.cepic.org/iptc.htm)

The Need.

Unlike textual data who’s content can be indexed and hence searched at a later time, digital picture data is nothing more than a representation of the pixels that make up the image. There are currently no methods - beyond some purely experimental projects, to distinguish a picture of say a cup from a spaceship by examining just the pixel data. Hence we typically associate or 'bind' some text to a digital image file such that searching for words within the bound text can locate its associated image.

For small quantities of image files this text association may not be necessary as we can simply give each picture file a descriptive file name. (Even so there are an increasing number of publishers who insist that all digital pictures be supplied with a textual description.) However, this method rapidly becomes impracticable as the quantity of files increases. For larger quantities of files it is necessary to employ a database in order to manage their archiving and retrieval. Such databases - known as media archives in the above context, 'catalogue' large numbers of image files by recording in lists (indexing) each word in the associated textual description.

For a digital picture archive who’s content is used exclusively 'in-house' the textual description can take any form so long as sufficient information is recorded for images to be successfully retrieved. But for archives who’s content is destined for 'outside' consumers, such as in the publishing industry, a non-structured or proprietary description structure is not sufficient as many consumers today maintain their own digital archives.

It is therefore necessary to not only caption digital pictures adequately but to also caption them with a structure that is acceptable to the widest possible audience.

The Choices.

Whenever information technology is adopted by a market sector its growth is always followed by a need for standardisation so that data can be easily interchanged between differing systems. The Internet for example has only been able to embrace a phenomenal rate of growth because data is interchanged using a platform-independent standard known as 'HTML'. In medicine digitised X-rays and other medical images can be interchanged across platforms because they are captioned with a format known as 'DICOM'.

In the publishing industry however things are not so well defined. Perhaps this is because publishing is such a diverse activity that no one standard will cover all needs. For digital still pictures however there are two possible standards - one ratified jointly by the International Press Telecommunications Council and the Newspaper Association of America (IPTC -NAA); and a sub-set of the IPTC standard that has become itself a de facto standard because of the widespread use of Adobe's PhotoShop.

IPTC -NAA

This standard was developed by a consortium of world-wide 'content' providers such as Reuters and AP who’s services deliver a range of products including news, financial data, graphics, still pictures, movie and audio clips, over a range of transport mechanisms such as satellite, wide area networks, cable and direct links. The standard is therefore constructed in two parts - the first dealing with the routing and delivery of the content, and the second dealing with how the content is described.

In IPTC terms, what gets delivered is known, not as a picture, text file, audio clip or anything else that specific. It is known simply as an 'Object', and the addressing information needed in order to route and deliver the object is known as that object's 'envelope'.

Part One of the IPTC standard - the format of the envelope, is really only of interest to the likes of Reuters and end users such as national newspapers, broadcasters and financial institutions as it aides in the automatic delivery of items to the right 'desk'. It has a lesser relevance in the context of captioning digital pictures for Photo libraries.

It is to Part Two of the standard that many picture suppliers turn in order to structure their captions. This is by no means universal but since, as we shall see later, PhotoShop uses a subset of part two, it's about the nearest thing available to common practice.

Table 1 below lists the complete IPTC standard fields. Although this might look complicated, remember that all we are trying to do here is to bring some standardisation to how we structure the fields of a database. Columns one and two represent the field names - the numbers are used by the database and the names used for the field headings. Column three gives a brief description of how the field should be used - ie what sort of content the field should hold. The fourth column defines what type of data the field should contain - text, dates etc. Notice that many fields do not contain 'words' but codes which represent entries in standardised lists. For example, the Category field is not a text string but three Alpha characters.

Table 1. THE IPTC STANDARD FIELDS

PART 1

ID FIELD NAME DESCRIPTION/ TYPE
1:00 Model Version
1:05 Destination Optional / Used by some providers who need additional routing information
1:20 File Format Mandatory /
1:22 File Format Version Mandatory /
1:30 ServiceId Mandatory / Identifies the provider and the product
1:40 EnvelopeNumber Mandatory / Used with 1:30 and 1:170 to identify a same record
1:50 ProductId Optional / Used to identify a subset of providers overall service
1:60 EnvelopePrior Optional / Priority of the transmission (not the editoria priority) 1 / 5 / 8
1:70 DateSent Mandatory / Date the service sent the material
1:80 TimeSent Optional / Time the service sent the material
1:90 Coded Character Set Optional /

PART 1

ID FIELD NAME DESCRIPTION/ TYPE
2:00 Record Version Mandatory / The current IPTC Information Interchange Model = 4
2:03 Object Type Ref. eg, 1:News, 2:Data, 3:Advisory x:yy
2:04 Object Attribute Ref eg, 1:Current, 2:Analysis, 3: Archive .. more
x:yy
2:05 ObjectName Eg “Diana at the Proms”
64 Bytes
2:07 EditStatus Status of the objectdata accoding to the provider eg ‘Correction”
64 Bytes
2:08 Editorial Update (To a previous object)
x:yyy (Num)
2:10 Urgency Editorial urgency 1 = most, 5 = normal, 8 = least x
2:12 Subject Reference IPTC:SubRefNum:SubName:SubMatter:SubDetailName
13 to 236 bytes
2:15 Category Identifies the subject of the object in the opinion of the provider
AAA (Alpha)
2:20 SupplCategory Further dentifies the subject of the object in the opinion of the provider
32 bytes
2:22 FixtureId Identifies freequently occuring object data eg ‘Euroweather”
32 bytes (Alpha)
2:25 Keywords 64 bytes
2:26 Content Location Code 3 char code indicating which country the event took place eg XEU
AAA (Alpha)
2:27 Content Location Name Name of the country the event took place eg ‘Europe’
64 Bytes
2:30 ReleaseDate The earliest date the provider intends the object is to be used
Date
2:35 ReleaseTime The earliest time the provider intends the object is to be used
Time
2:37 Epiration Date The latest date the provider intends the object is to be used
Date
2:38 Epiration Time The earliest time the provider intends the object is to be used
Time
2:40 SpecialInstru text
256 Bytes
2:42 Action Advised 2 Digits - provider defined. eg 01 = Kill object
xx
2:45 RefService Only used to refer to a previous 1:30 (Reference Service)
10 Bytes (Alpha)
2:47 RefDate Only ollowed if 2:45 used
2:50 RefNumber Only ollowed if 2:45 used
8 Bytes (Num)
2:55 DateCreated Date of the actual event - not when it was digitisedDate
Date
2:60 TimeCreated Time of the actual event - not when it was digitised
Time
2:62 Digital Creation Date Date the digital representation of the objectdata was creatred
Date
2:63 Digital Creation Time Time the digital representation of the objectdata was creatred
Time
2:65 OrigProgram eg “PhotoShop”
32 Bytes
2:70 ProgramVersion Only use if 2:65 used eg ‘5.0”
10 Bytes
2:75 ObjectCycle Virtually only used in US for Morning Evening or both
2:80 Byline Photographer
32 Bytes
2:85 BylineTitle Eg House photographer, correspondent etc
32 Bytes
2:90 City According to the practices of the provider eg “London”
32 Bytes
2:92 SubLocation According to the practices of the provider eg “Soho”
32 Bytes
2:95 ProvinceState According to the practices of the provider eg “Midlands” “Surrey”
32 Bytes
2:100 CountryCode The country code of the subject matter eg “USA” “GBR”
AAA (Alpha)
2:101 CountryName The country code of the subject matter eg “Great Britain”
64 Bytes
2:103 OriginalRef A code meaning where the object was transmitted from
32 Bytes
2:105 HeadLine Synopsis of the subject matter
256 Bytes
2:110 Credit Identifies the provider - not necessarily the owner / creator
32 Bytes
2:115 Source Identifies the original owner / creator
32 Bytes
2:116 Copyright Notice The providers own copyright notice
128 Bytes
2:118 Contact For further info on the object
128 Bytes
2:120 Caption Full version / rest of the headline
2000 Bytes
2:122 CaptionWriter Who wrote the caption
32 Bytes
2:125 Rasterised Caption  
2:130 ImageType Code representing B/w, Y comp of seps etc.Contact
x:A
2:131 Image orientation P = Portrait, L = landscape, S = Square
1 Byte
2:135 LanfIdentifier  
2:150 AudioType  
2:151 Audio Sampling Rate  
2:152 Audio Sampling Res  
2:153 Audio Duration
2:154 Audio Autocue 64 Bytes
2:200 ObDataPreviewFileFormat
2:201 ObDataPreviewFileFormatVersion
2:202 ObDataPreviewData
PhotoShop

The IPTC standard is now so well established in the digital picture business that it was considered important enough by Adobe™ to be incorporated into PhotoShop™ - the industry's prime digital picture manipulation application. PhotoShop though uses only a small percentage of the total number of IPTC fields available. To access its IPTC facilities users open a picture and select 'File Info'. The fields then made available are grouped into sections as follows.

Table 2. Adobe Photoshop 'File Info' Fields

Caption Keywords Categories Credits Origin Copyright Notice
Caption Keyword 1 Category Number Byline Object Name Copyright Notice
Caption Writer Keyword 2 Supplemental Cat Byline Title Date Created URL
Headline etc. Urgency Credit City  
Special Instructions     Source Province-State  
        Country Name  
        Original Ref  

Given the almost universal use of PhotoShop by picture consumers it makes sense to consider its captioning structure when constructing databases for use in phot libraries.

Which fields to use?

As a sub-set of the full standard the PhotoShop fields are a good place to start but there are some significant drawbacks and limitations for photo libraries.

Firstly the scope of the headings is often insufficient. Libraries often need to record additional information such as 'Restrictions' whereby images may have limitations imposed upon their sale due to copyright, content, exclusivity or other commercial reasons. Although the 'Special Instructions' field might seem appropriate for this use oftentimes additional fields are required. 

Secondly, the type and length of data that can be entered into each field can lead to problems - the 'Caption' fields for example can only take up to 255 characters; and whilst we might want to use some text in the 'Category' field we can't because this field will only accept a three-character code.

And finally the IPTC standard has moved on since PhotoShop incorporated this scheme and the standard now officially discourages the use of the "Category' and "Sub-Category' fields. Rather than continuing to let picture suppliers define their own categories and sub-categories, the standard now attempts to predefine what these categorisations should be. It's debatable whether or not this is an improvement but the fact remains that the standard and PhotoShop are now in conflict.

Storing the caption information.

There are three principal places to store a picture's caption information; as a separate text file, as part of the JPEG data stream or for Macintosh files as one or more 'resources'.

Saving the caption as a separate text file is the most flexible method as it imposes no particular limitations on the textual data and text files can be read on all computer platforms. Its drawback though is that two files must be delivered and managed for each picture.

Saving the caption as part of the JPEG data stream avoids having two files and is potentially suitable for both Macintosh and PCs. But the problem is where precisely in the data stream to store the information? PhotoShop stores its caption at the start of the stream, other products and the end of the data stream - and just about any other place can be encountered. Whilst PhotoShop is widespread and itself can import picture files from a number of other applications, it is not always certain that the caption data will be in a place that PhotoShop can understand.

Saving the caption as a resource is common practice but only in as much as image manipulation has until recently been done mostly on Macintosh computers. But what do we mean by a 'resource'. Unlike files generated on PC's, Macintosh files are actually made up of two parts - even though to the outside world they look like single files. These two parts are known in programming terms as 'forks' - one for the data and one for the file's 'resources'.

Resources may at first seem to be a rather abstract concept due to their invisibility at the operating system level. But consider for a moment a file that is an application or programme. Programmes need to put up dialogue boxes and messages. They need to open windows and have buttons etc. It is convenient to gather all these functions together in the resource fork as it makes it easier to both maintain and edit them - translating all the English into another language for example. For data files the situation is far less complicated but non the less useful.

File resource forks can contain such things as custom icons, file version information and in our case, the caption for the picture data. If a file is opened with an application that is capable of displaying the resource fork, various resources would be observed, each with a name and a number. The icons that you see on your Mac for example vary according to the view that you have selected - list, small icons or large icons. These mini pictures are stored in a file's icon resource which would have a number something like -16455.

We might expect therefore that our caption will be contained in some sort of a caption resource. This is true - and it's not. Unfortunately, although PhotoShop stores its caption information mostly in a resource known as the ANPA (IPTC) resource, not all of it is stored there and furthermore other applications may store their captions in entirely different resources altogether.

Why the Category field is a problem.

The current IPTC standard defines categories (ie picture categories) in such a way that it is up to the service providers to come up with schemes of their own. What the standard originally provided was a three character code followed by a 64 character description field plus one other 64 character field intended to refine these categories - the Supplemental Category field.  Thus a provider would design its own scheme along the lines of say ..

AAA = Sport
AAB = Politics
AAC = Personalities etc.

And whatever they liked in the supplemental category field. It doesn't take much to see that this soon degenerates into a pretty non-standard standard with each picture supplier categorising their content differently.  Revision 4 of the standard provides for an alternative way to standardise categories and sub categories whereby the IPTC now defines what these should be and calls the field the Subject Reference field. And furthermore it advises on the use of the Category and Sub Category fields as follows ..

"Use of these fields is deprecated. It is likely that these fields will not be included in further versions of the standard".

Note: PhotoShop still uses this scheme!

Subject Reference fields

A typical Subject Reference field would like like ..

IPTC : 10170100 : Arts, Culture & Entertainment : Theatre : Actors

Notice that there are five parts to this field each separated by a colon. The first two parts are mandatory and the remaining three are optional.

Subject Reference Part 1 - IPR

This part provides a mandatory 'Information Provider Reference'. Unless Camera Press becomes a registered provider with the IPTC, along with the likes of Agence

France Presse - 'AFP', Associated Press - 'AP', or Reuters - 'Reuters', this field should contain 'IPTC'.

Subject Reference Part 2 - SRN

The Subject Reference number is an 8 digit number which completely takes over from the 'Category' and 'Sub-Category' fields. However, not any old number will do here. The IPTC maintains a list of approved numbers which relate to specific subjects and hence to use an approved number this list needs to be consulted each time a picture is captioned - and it's a long list! Furthermore, unless you as a picture supplier become a registered provider with the IPTC you has no say in what the list of subject matter contains - you must use those already defined.

The Subject Reference Number is itself built up from three parts.

The first two digits come form a list of subject numbers which match broad categories of subject matter. eg
10 = Arts, Culture & Entertainment

The next three digits are taken from a table which further classifies the preceding section
eg 170 = Theatre

The last three digits are taken from a table which provides more detail to the preceding section
eg 100 = Actors

(this is actually a fictitious example as the table of classifications was not available) And so our Subject Reference number would look as follows ...

Subject Reference Field Example
IPR (Information Provider Reference)
SRN
(Subject Reference Number)
SN
(Subject Name)
SMN
(Subject Matter Name)
SDN
(Subject Detail Name)
IPTC
:
10170100
:
Arts, Culture & Entertainment
:Theatre
:Actors

Now, it would be perfectly legitimate when captioning a picture to enter in the Subject Reference field nothing more than ..

IPTC:10170100

The computers of customers with automated picture reception facilities would be able to understand the subject matter and detail of this picture exactly. But those less fortunate customers would have no idea what the heck '10170100' meant. For this reason the Subject Reference field can have three optional parameters - each separated by a colon, which provide textual interpretations of the Subject Reference Number.

Subject Reference Part 3 - Subject Name

Up to 64 characters eg 'Arts, Culture & Entertainment'

Subject Reference Part 4 - Subject Matter Name

Up to 64 characters eg 'Theatre'

Subject Reference Part 5 - Subject Detail Name

Up to 64 characters eg 'Actors'

Hence we get our full entry which both computers and people can understand ...
IPTC : 10170100 : Arts, Culture & Entertainment : Theatre : Actors

In conclusion

It might seem from the above that there is little about the IPTC standard that is in fact standard and to some extent this is true. But many end-user's systems such as those used in the newspaper and magazine markets expect to receive digital pictures captioned under the IPTC format. What can be concluded from the above is that in order to implement the standard, picture libraries need to compromise. On the one hand sufficient captioning information needs to be provided so that images can be readily identified and catalogued. On the other hand too much information slows down the time it takes to ready pictures for delivery. At Camera Press in London England, for example, the scope of the PhotoShop fields available has proven to be insufficient for internal use and extra fields - both IPTC and fields specific to Camera Press have had to be used. Internally all images are catalogued in a Phraséa database which creates a separate text file for the caption but Phraséa is also able to read PhotoShop's internal caption during the archiving process. Those who access the Phraséa database directly to download pictures can get the best of both worlds, the full text file and the picture file's internal caption which can be read by PhotoShop. Accessing pictures via the world wide web however only delivers the file's internal caption.

It is unfortunate that many end-users underestimate the significance of captioning structures - after all it's the digital image that they are interested in. But designing suitable captioning structures, as we have seen above, is a tricky business. Get it wrong at the outset and it becomes increasingly difficult to correct your mistakes as your archive grows. Reworking thousands of captions is not something that you want to do too often!

If you are about to design a caption structure it is important to bear in mind the following ...

1. The overwhelming majority of your customers will download the picture file ONLY - and many will not even bother to read the text within PhotoShop's 'File Info' window. eg, special prices!

2. If you define fields that are additional to those that PhotoShop can understand they will most likely not be transmitted to the end user. This can be either a problem or a benefit depending on your caption structure.

3. It is not advisable to use the 'Category' and 'Sub-Category' fields as these are no longer supported by the standard. Since these fields are potentially so important it is essential to understand how the new 'Subject Reference' field works. The full standard can be obtained from http://www.iptc.org/iptc

Ian Lynch is an independent Phrasea Specialist in the UK and can be contacted at ian@lanmarque.co.uk