Digital Picture
Captioning A discussion of some of the issues associated with the captioning of digital images. I. Lynch, Jan '99 (source: http://www.cepic.org/iptc.htm)
Unlike textual data who’s content can be indexed and hence searched at a later time, digital picture data is nothing more than a representation of the pixels that make up the image. There are currently no methods - beyond some purely experimental projects, to distinguish a picture of say a cup from a spaceship by examining just the pixel data. Hence we typically associate or 'bind' some text to a digital image file such that searching for words within the bound text can locate its associated image. For small quantities of image files this text association may not be necessary as we can simply give each picture file a descriptive file name. (Even so there are an increasing number of publishers who insist that all digital pictures be supplied with a textual description.) However, this method rapidly becomes impracticable as the quantity of files increases. For larger quantities of files it is necessary to employ a database in order to manage their archiving and retrieval. Such databases - known as media archives in the above context, 'catalogue' large numbers of image files by recording in lists (indexing) each word in the associated textual description. For a digital picture archive who’s content is used exclusively 'in-house' the textual description can take any form so long as sufficient information is recorded for images to be successfully retrieved. But for archives who’s content is destined for 'outside' consumers, such as in the publishing industry, a non-structured or proprietary description structure is not sufficient as many consumers today maintain their own digital archives. It is therefore necessary to not only caption digital pictures adequately but to also caption them with a structure that is acceptable to the widest possible audience.
Whenever information technology is adopted by a market sector its growth is always followed by a need for standardisation so that data can be easily interchanged between differing systems. The Internet for example has only been able to embrace a phenomenal rate of growth because data is interchanged using a platform-independent standard known as 'HTML'. In medicine digitised X-rays and other medical images can be interchanged across platforms because they are captioned with a format known as 'DICOM'. In the publishing industry however things are not so well defined. Perhaps this is because publishing is such a diverse activity that no one standard will cover all needs. For digital still pictures however there are two possible standards - one ratified jointly by the International Press Telecommunications Council and the Newspaper Association of America (IPTC -NAA); and a sub-set of the IPTC standard that has become itself a de facto standard because of the widespread use of Adobe's PhotoShop.
This standard was developed by a consortium of world-wide 'content' providers such as Reuters and AP who’s services deliver a range of products including news, financial data, graphics, still pictures, movie and audio clips, over a range of transport mechanisms such as satellite, wide area networks, cable and direct links. The standard is therefore constructed in two parts - the first dealing with the routing and delivery of the content, and the second dealing with how the content is described. In IPTC terms, what gets delivered is known, not as a picture, text file, audio clip or anything else that specific. It is known simply as an 'Object', and the addressing information needed in order to route and deliver the object is known as that object's 'envelope'. Part One of the IPTC standard - the format of the envelope, is really only of interest to the likes of Reuters and end users such as national newspapers, broadcasters and financial institutions as it aides in the automatic delivery of items to the right 'desk'. It has a lesser relevance in the context of captioning digital pictures for Photo libraries. It is to Part Two of the standard that many picture suppliers turn in order to structure their captions. This is by no means universal but since, as we shall see later, PhotoShop uses a subset of part two, it's about the nearest thing available to common practice. Table 1 below lists the complete IPTC standard fields. Although this might look complicated, remember that all we are trying to do here is to bring some standardisation to how we structure the fields of a database. Columns one and two represent the field names - the numbers are used by the database and the names used for the field headings. Column three gives a brief description of how the field should be used - ie what sort of content the field should hold. The fourth column defines what type of data the field should contain - text, dates etc. Notice that many fields do not contain 'words' but codes which represent entries in standardised lists. For example, the Category field is not a text string but three Alpha characters. Table 1. THE IPTC STANDARD FIELDS PART 1
PART 1
The IPTC standard is now so well established in the digital picture business that it was considered important enough by Adobe™ to be incorporated into PhotoShop™ - the industry's prime digital picture manipulation application. PhotoShop though uses only a small percentage of the total number of IPTC fields available. To access its IPTC facilities users open a picture and select 'File Info'. The fields then made available are grouped into sections as follows. Table 2. Adobe Photoshop 'File Info' Fields
Given the almost universal use of PhotoShop by picture consumers it makes sense to consider its captioning structure when constructing databases for use in phot libraries. Which fields to use? As a sub-set of the full standard the PhotoShop fields are a good place to start but there are some significant drawbacks and limitations for photo libraries. Firstly the scope of the headings is often insufficient. Libraries often need to record additional information such as 'Restrictions' whereby images may have limitations imposed upon their sale due to copyright, content, exclusivity or other commercial reasons. Although the 'Special Instructions' field might seem appropriate for this use oftentimes additional fields are required. Secondly, the type and length of data that can be entered into each field can lead to problems - the 'Caption' fields for example can only take up to 255 characters; and whilst we might want to use some text in the 'Category' field we can't because this field will only accept a three-character code. And finally the IPTC standard has moved on since PhotoShop incorporated this scheme and the standard now officially discourages the use of the "Category' and "Sub-Category' fields. Rather than continuing to let picture suppliers define their own categories and sub-categories, the standard now attempts to predefine what these categorisations should be. It's debatable whether or not this is an improvement but the fact remains that the standard and PhotoShop are now in conflict. Storing the caption information. There are three principal places to store a picture's caption information; as a separate text file, as part of the JPEG data stream or for Macintosh files as one or more 'resources'. Saving the caption as a separate text file is the most flexible method as it imposes no particular limitations on the textual data and text files can be read on all computer platforms. Its drawback though is that two files must be delivered and managed for each picture. Saving the caption as part of the JPEG data stream avoids having two files and is potentially suitable for both Macintosh and PCs. But the problem is where precisely in the data stream to store the information? PhotoShop stores its caption at the start of the stream, other products and the end of the data stream - and just about any other place can be encountered. Whilst PhotoShop is widespread and itself can import picture files from a number of other applications, it is not always certain that the caption data will be in a place that PhotoShop can understand. Saving the caption as a resource is common practice but only in as much as image manipulation has until recently been done mostly on Macintosh computers. But what do we mean by a 'resource'. Unlike files generated on PC's, Macintosh files are actually made up of two parts - even though to the outside world they look like single files. These two parts are known in programming terms as 'forks' - one for the data and one for the file's 'resources'. Resources may at first seem to be a rather abstract concept due to their invisibility at the operating system level. But consider for a moment a file that is an application or programme. Programmes need to put up dialogue boxes and messages. They need to open windows and have buttons etc. It is convenient to gather all these functions together in the resource fork as it makes it easier to both maintain and edit them - translating all the English into another language for example. For data files the situation is far less complicated but non the less useful. File resource forks can contain such things as custom icons, file version information and in our case, the caption for the picture data. If a file is opened with an application that is capable of displaying the resource fork, various resources would be observed, each with a name and a number. The icons that you see on your Mac for example vary according to the view that you have selected - list, small icons or large icons. These mini pictures are stored in a file's icon resource which would have a number something like -16455. We might expect therefore that our caption will be contained in some sort of a caption resource. This is true - and it's not. Unfortunately, although PhotoShop stores its caption information mostly in a resource known as the ANPA (IPTC) resource, not all of it is stored there and furthermore other applications may store their captions in entirely different resources altogether. Why the Category field is a problem. The current IPTC standard defines categories (ie picture categories) in such a way that it is up to the service providers to come up with schemes of their own. What the standard originally provided was a three character code followed by a 64 character description field plus one other 64 character field intended to refine these categories - the Supplemental Category field. Thus a provider would design its own scheme along the lines of say .. AAA = Sport "Use of these fields is deprecated. It is likely that these fields will not be included in further versions of the standard". Note: PhotoShop still uses this scheme! Subject Reference fields A typical Subject Reference field would like like .. IPTC : 10170100 : Arts, Culture & Entertainment : Theatre : Actors Notice that there are five parts to this field each separated by a colon. The first two parts are mandatory and the remaining three are optional. Subject Reference Part 1 - IPR This part provides a mandatory 'Information Provider Reference'. Unless Camera Press becomes a registered provider with the IPTC, along with the likes of Agence France Presse - 'AFP', Associated Press - 'AP', or Reuters - 'Reuters', this field should contain 'IPTC'. Subject Reference Part 2 - SRN The Subject Reference number is an 8 digit number which completely takes over from the 'Category' and 'Sub-Category' fields. However, not any old number will do here. The IPTC maintains a list of approved numbers which relate to specific subjects and hence to use an approved number this list needs to be consulted each time a picture is captioned - and it's a long list! Furthermore, unless you as a picture supplier become a registered provider with the IPTC you has no say in what the list of subject matter contains - you must use those already defined. The Subject Reference Number is itself built up from three parts. The first two digits come form a list of subject numbers which match
broad categories of subject matter. eg The next three digits are taken from a table which further classifies
the preceding section The last three digits are taken from a table which provides more detail
to the preceding section (this is actually a fictitious example as the table of classifications was not available) And so our Subject Reference number would look as follows ... Subject Reference Field Example Now, it would be perfectly legitimate when captioning a picture to
enter in the Subject Reference field nothing more than ..
The computers of customers with automated picture reception facilities would be able to understand the subject matter and detail of this picture exactly. But those less fortunate customers would have no idea what the heck '10170100' meant. For this reason the Subject Reference field can have three optional parameters - each separated by a colon, which provide textual interpretations of the Subject Reference Number. Subject Reference Part 3 - Subject Name Up to 64 characters eg 'Arts, Culture & Entertainment' Subject Reference Part 4 - Subject Matter Name Up to 64 characters eg 'Theatre' Subject Reference Part 5 - Subject Detail Name Up to 64 characters eg 'Actors' Hence we get our full entry which both computers and people can
understand ... In conclusion It might seem from the above that there is little about the IPTC standard that is in fact standard and to some extent this is true. But many end-user's systems such as those used in the newspaper and magazine markets expect to receive digital pictures captioned under the IPTC format. What can be concluded from the above is that in order to implement the standard, picture libraries need to compromise. On the one hand sufficient captioning information needs to be provided so that images can be readily identified and catalogued. On the other hand too much information slows down the time it takes to ready pictures for delivery. At Camera Press in London England, for example, the scope of the PhotoShop fields available has proven to be insufficient for internal use and extra fields - both IPTC and fields specific to Camera Press have had to be used. Internally all images are catalogued in a Phraséa database which creates a separate text file for the caption but Phraséa is also able to read PhotoShop's internal caption during the archiving process. Those who access the Phraséa database directly to download pictures can get the best of both worlds, the full text file and the picture file's internal caption which can be read by PhotoShop. Accessing pictures via the world wide web however only delivers the file's internal caption. It is unfortunate that many end-users underestimate the significance of captioning structures - after all it's the digital image that they are interested in. But designing suitable captioning structures, as we have seen above, is a tricky business. Get it wrong at the outset and it becomes increasingly difficult to correct your mistakes as your archive grows. Reworking thousands of captions is not something that you want to do too often! If you are about to design a caption structure it is important to bear in mind the following ... 1. The overwhelming majority of your customers will download the picture file ONLY - and many will not even bother to read the text within PhotoShop's 'File Info' window. eg, special prices! 2. If you define fields that are additional to those that PhotoShop can understand they will most likely not be transmitted to the end user. This can be either a problem or a benefit depending on your caption structure. 3. It is not advisable to use the 'Category' and 'Sub-Category' fields as these are no longer supported by the standard. Since these fields are potentially so important it is essential to understand how the new 'Subject Reference' field works. The full standard can be obtained from http://www.iptc.org/iptc Ian Lynch is an independent Phrasea Specialist in the UK and can be contacted at ian@lanmarque.co.uk |