۱۳۸۹ اردیبهشت ۱۰, جمعه

Setting the Stage

Metadata, literally “data about data,” has become a widely used yet still frequently underspecified term that is understood in different ways by the diverse professional communities that design, create, describe, preserve, and use information systems and resources. It is a construct that has been around for as long as humans have been organizing information,albeit transparently in many cases, and today we create and interact with it in increasingly digital ways. For the past hundred years at least, the creation and management of metadata has primarily been the responsibility of information professionals engaged in cataloging, classification, and indexing; but as information resources are increasingly put online by the general public, metadata considerations are no longer solely the rovince

of information professionals. Although metadata is arguably a much less familiar term among creators and consumers of networked digital content who are not information professionals per se, these same individuals are increasingly adept at creating, exploiting, and assessing user-contributed metadata such as Web page title tags, folksonomies, and social bookmarks.

Schoolchildren and college students are taught in information literacy programs to look for metadata such as provenance and date information in order to ascertain the authoritativeness of information that they retrieve on the Web. Thus it has become more important than ever that not only information professionals but also other creators and users of digital

content understand the critical roles of different types of metadata in ensuring accessible, authoritative, interoperable, scaleable, and preservable cultural heritage information and record-keeping systems.Until the mid-1990s, metadata was a term used primarily by communities involved with the management and interoperability of geospatial data and with data management and systems design and maintenance

in general. For these communities, metadata referred to a suite of industry or disciplinary standards as well as additional internal and external documentation and other data necessary for the identification,representation, interoperability, technical management, performance, and use of data contained in an information system.

Perhaps a more useful, “big picture” way of thinking about

metadata is as the sum total of what one can say about any information object at any level of aggregation.¹ In this context, an information object is anything that can be addressed and manipulated as a discrete entity by a human being or an information system. The object may comprise a single item, it may be an aggregate of many items, or it may be the entire

database or record-keeping system. Indeed, in any given instance one can expect to find metadata relevant to any information object existing simultaneously at the item, aggregation, and system levels.

In general, all information objects, regardless of the physical or

intellectual form they take, have three features—content, context, and structure—all of which can and should be reflected through metadata.

Content relates to what the object contains or is about and is

intrinsic to an information object.

Context indicates the who, what, why, where, and how aspects

associated with the object’s creation and is extrinsic to an information object.

Structure relates to the formal set of associations within or among individual information objects and can be intrinsic or extrinsic or both.

Cultural heritage information professionals such as museum

registrars, library catalogers, and archival processors often apply the term metadata to the value-added information that they create to arrange,describe, track, and otherwise enhance access to information objects and the physical collections related to those objects. Such metadata is frequently governed by community-developed and community-fostered

standards and best practices in order to ensure quality, consistency, and interoperability. The following Typology of Data Standards organizes these standards into categories and provides examples of each. Markup languages such as HTML and XML provide a standardized way to structure and express these standards for machine processing, publication, and

implementation.

Library metadata development has been first and foremost about providing intellectual and physical access to collection materials. Library metadata includes indexes, abstracts, and bibliographic records created according to cataloging rules (data content standards) such as the Anglo-American Cataloguing Rules (AACR) and data structure standards such as the MARC (Machine-Readable Cataloging) format, as well as data value standards such as the Library of Congress Subject Headings (LCSH) or the Art & Architecture Thesaurus (AAT). Such bibliographic metadata has been systematically and cooperatively created and shared since the 1960s and made available to repositories and users through automated systems such as bibliographic utilities, online public access catalogs (OPACs), and commercially available atabases. Today this type of metadata is created not only by humans but also in automated ways through such means as metadata mining, metadata harvesting, and Web crawling. Automation of metadata will inevitably continue to expand with the development of the Resource Description Framework (RDF) and the Semantic Web, which are discussed later in this book.

A large component of archival and museum metadata creation

activities has traditionally been focused on context. Elucidating and preserving context is what assists with identifying and preserving the evidential value of records and artifacts in and over time; it is what facilitates the authentication of those objects, and it is what assists researchers with their analysis and interpretation. Archival and manuscript metadata

(more commonly referred to as archival description) includes accession records, finding aids, and catalog records. Archival data structure standards that have been developed in the past three decades include the MARC Archival and Manuscripts Control (AMC) format, published by the Library of Congress in 1984 (now integrated into the MARC21 format for bibliographic description); the General International Standard Archival Description (ISAD (G)), published by the International Council

on Archives in 1994; Encoded Archival Description (EAD), adopted as a standard by the Society of American Archivists (SAA) in 1999, and its companion data content standard, Describing Archives: A Content Standard (DACS), first published in 2004. The Metadata Encoding and Transmission

Standard (METS), developed by the Digital Library Federation and maintained by the Library of Congress, is increasingly being used for encoding descriptive, administrative, and structural metadata and digital surrogates at the item level for objects such as digitized photographs,maps,and correspondence from the collections described by finding aids and other collection- or group-level metadata records. While archival

metadata was primarily only available locally at individual repositories until the late 1990s, it is now distributed online through resources such as OCLC (Online Computer Library Center),² Archives USA,³ and EADbased resources such as the Online Archive of California and the Library of Congress’s American Memory Project.

Consensus and collaboration have been slower to build in the

museum community, where the benefits of standardization of description such as shared cataloging and exchange of descriptive data were less readily apparent until relatively recently. Since the late 1990s, tools such as Categories

for the Description of Works of Art (CDWA), Spectrum, the CIDOC Conceptual Reference Model, Cataloging Cultural Objects (CCO), and the CDWA Lite XML schema have begun to be considered and implemented by museums. Initiatives such as Museums and the Online Archive of California (MOAC)have examined the applicability and extensibility of descriptive standards developed by archives and libraries such as EAD and

METS to museum holdings in order to address the integration of cultural information across repository types, as well as the educational needs of users visiting online museum resources.

Although it would seem to be a desirable goal to integrate

materials of different types that are related by provenance or subject but distributed across museum, archives, and library repositories, initiatives such as MOAC have met with only limited success. As MOAC and the mid-1980s development of the now-defunct MARC AMC format have demonstrated, the distinctiveness of the various professional and objectbased approaches (e.g., widely differing notions of provenance and collectivity as well as of structure) and the different institutional cultures have left many professionals feeling that their practices and needs have been

shoehorned into structures that were developed by another community with quite different practices and users. As enunciated in Principle 6 of “Practical Principles for Metadata Creation and Maintenance” (p. 72),

there is no single metadata standard that is adequate for describing all types of collections and materials; selection of the most appropriate suite of metadata standards and tools, and creation of clean, consistent metadata according to those standards, not only will enable good descriptions of specific collection materials but also will make it possible to map metadata created according to different community-specific standards, thus furthering the goal of interoperability discussed in subsequent chapters of this book.

An emphasis on the structure of information objects in metadata

development by these communities has perhaps been less overt. However,structure has always been important in information organization and representation, even before computerization. Documentary and publication forms have evolved into industry standards and societal norms and have become an almost transparent information management tool. For example, when users access a birth certificate they can predict its likely structure and content. When academics use a scholarly monograph, they understand intuitively that it will be organized with a table of contents,chapter headings, and an index. Archivists use the physical structure of their finding aids to provide visual cues to researchers about the structural

relationships between different parts of a record series or manuscript collection. Archival description also exploits the hierarchical arrangement of records according to the bureaucratic hierarchies and business practices of the creators of those records. However, in recent years there has been

increasing criticism that while valuable for retaining context and original order, collection-level, hierarchical metadata as exemplified in archival finding aids privileges the scholarly user of the archive (and those who are familiar with the structure and function of archival finding aids) whilel eaving the nonexpert user baffled, as well as unnecessarily perpetuating a

paper-based descriptive paradigm.

In the online world, multiple descriptive relationships between objects can be supported simultaneously, and

some of these may more effectively support new types of users and uses in an environment that is not mediated by a reference archivist. Archives and other collecting institutions are beginning to explore methods of description that exploit item-level metadata for digitized objects so that users can search for specific items, navigate through a collection “bottom-up” as well as “top-down,” and collate related collection materials through lateral searching across collections and repositories.

The role of structure has been growing as computer-processing

capabilities become increasingly powerful and sophisticated. Information communities are aware that the more highly structured an information object is, the more that structure can be exploited for searching, manipulation,and interrelating with other information objects. Capturing,documenting, and enforcing that structure, however, can only occur if supported by specific types of metadata. In short, in an environmentwhere a user can gain unmediated access to information objects over a

network, metadata

• certifies the authenticity and degree of completeness of the

content;

• establishes and documents the context of the content;

• identifies and exploits the structural relationships that exist

within and between information objects;

• provides a range of intellectual access points for an increasingly diverse range of users; and

• provides some of the information that an information professional might have provided in a traditional, in-person reference or research setting.

But there is more to metadata than description and resource

discovery. A more inclusive conceptualization of metadata is needed as we consider the range of activities that may be incorporated into digital information systems. Repositories also create metadata relating to the administration, accessioning, preservation, and use of collections. Acquisition records, exhibition catalogs, licensing agreements, and educational

metadata are all examples of these other kinds of metadata and data. Integrated information resources such as virtual museums, digital libraries, and archival information systems include digital versions of actual collection content (sometimes referred to as digital surrogates), as well as descriptions of that content (i.e., descriptive metadata, in a variety of formats). Incorporating other types of metadata into such resources reaffirms the importance of metadata in administering collections and maintaining their intellectual integrity both in and over time. Paul Conway alludes to this capability of

metadata when he discusses the impact of digitization on preservation:

The digital world transforms traditional preservation concepts

from protecting the physical integrity of the object to specifying

the creation and maintenance of the object whose intellectual

integrity is its primary characteristic.When applied outside the original repository, the term metadata acquires an even broader scope. An Internet resource provider might use metadata to refer to information that is encoded in HTML meta tags for

the purposes of making a Web site easier to find. Individuals who are digitizing images might think of metadata as the information they enter into a header field for the digital file to record information about the image file, the imaging process, and image rights. A social science data archivist might use the term to refer to the systems and research documentation

necessary to run and interpret a magnetic tape containing raw research data. An electronic records archivist might use the term to refer to all the contextual, processing, preservation, and use information needed to identify and document the scope, authenticity, and integrity of an active or archival record in an electronic record-keeping or archival preservation system. Metadata is crucial in personal information management and

for ensuring effective information retrieval and accountability in record keeping—something that is becoming increasingly important with the rise of electronic commerce and the use of digital content and tools by governments.

In all these diverse interpretations, metadata not only identifies

and describes an information object; it also documents how that object behaves, its function and use, its relationship to other information objects,and how it should be and has been managed over time.

As this discussion suggests, theory and practices vary considerably due to the differing professional and cultural missions of museums,archives, libraries, and other information and record-keeping communities.

Information professionals have a bewildering array of metadata standardsand approaches from which to choose. Many highly detailed metadata standards have been developed by individual communities (e.g., MARC,EAD, the Australian Recordkeeping Metadata Schema, RKMS, and some of the standards for Geographic Information Systems) that attempt to articulate their mission-specific differences as well as to facilitate mapping

between common data elements. If used appropriately and to their fullest extent, these standards have the potential to create extremely rich metadata that would provide detailed documentation of record-keeping creation and use in situations in which such activities may be challenged or audited for their comprehensiveness and accuracy.Creation and ongoing maintenance of such metadata, however, is complex, time consuming, and resource intensive and may only be justifiable when there is a legal mandate or other risk management incentive or when it is envisaged that the content and metadata may be reused or exploited in previously unanticipated ways,such as in digital asset management systems. By contrast, the Dublin Core Metadata Element Set (DCMES) identifies a relatively small, generic set of metadata elements that can be used by any community, expert or nonexpert,to describe and search across a wide variety of information resources on the World Wide Web. Such metadata standards are necessary to ensure that different kinds of descriptive metadata are able to interoperate with one other and with metadata from nonbibliographic systems of the kind that the data management communities and information creators are generating. Relatively lean metadata records such as those created using

the DCMES have the advantage of being cheaper to create and maintain,but they may need to be augmented by other types of metadata in order to address the needs of specific user communities and to adequately describeparticular types of collection materials.

Another form of metadata that has recently begun to appear is

user created; user-created metadata has been gathering momentum in a variety of venues on the Web. Just as many members of the general public have participated in the development of Web content, whether through personal Web pages or by uploading photos onto Flickr or videos onto

YouTube, they have also increasingly been getting into the business of creating, sharing, and copying metadata (albeit often unknowingly). Folksonomies that are created using specialized tagging tools in various Webbased communities in order to identify, retrieve, categorize, and promote Web content and the sharing of bookmarks through the practice of social

bookmarking are examples of the burgeoning user-created metadata on the Web. Among the advantages of these approaches is that individual Web communities such as affinity groups or hobbyists may be able to create metadata that addresses their specific needs and vocabularies in ways

that information professionals who apply metadata standards designed to cater to a wide range of audiences cannot. User-generated metadata is also a comparatively inexpensive way to augment existing metadata, with the cost and the sense of ownership shared among more parties than just those who create information repositories. The disadvantages of user-generated metadata relate to quality control (or lack thereof ) and idiosyncrasies that can impede the trustworthiness of both metadata and the resource it describes and negatively affect interoperability between metadata and the resources it is intended to describe. Issues of interoperability are discussed in some detail in the third chapter of this book.

Anne J. Gilliland

هیچ نظری موجود نیست:

ارسال یک نظر