University of Virginia
University of Virginia Library
American Studies Information Community

Building an American Studies Information Community

Libraries of the twenty-first century must take on the challenge of making sense of the flood of print and digital information now overwhelming students and scholars. The University of Virginia Library has just scratched the surface of a great potential to build and sustain a comprehensive program to support excellence in teaching and research. Our objective in creating the Library of Tomorrow is to make all forms of information easily accessible to our faculty and students as well as an international audience of academic and non-academic users. *This proposal was funded in late 2001*

Information Communities

Building on its strengths in user service, special collections, and digital initiatives, the Library is working to create the model University research library for the twenty-first century. The foundation of this model is a concept we are calling Information Communities.

An Information Community is a group of scholars, students, researchers, librarians, information specialists and citizens from similar or dissimilar fields, whose common link is a shared information need. This information need can be oriented around a subject, a field, a methodology, or a data type. The information can include text, data, digitized media, images, and formal and informal scholarly exchanges of ideas. Information Communities exist as a medium for bringing people together and making them aware of opportunities and resources. Community is fostered by personal communication, shared interests, shared research materials, shared tools, and shared standards. Information Communities add value to information, and offer opportunities for using information in new and different ways. Activities of the community can include creation of web-based materials, development of portable tools for enhancing access to the materials, and managing of conferences and publications. Information Communities foster innovation and spark new areas of research, and usually result in a tangible body of knowledge for consumers.

UVa Religious Studies professor David Germano, who has worked with the Library to build the Tibetan and Himalayan Digital Library, has described the basic structure: "an Information Community consists of the people (authors, publishers, and users), the collections (texts, images, videos, audio, and maps), and the tools provided for interacting with those collections. The Library is providing the technological, administrative, and organizational infrastructure for these collections, but relies on individual scholars and collaborative projects. Multimedia digital publication is at the heart of the Tibetan Information Community, which includes providing scholars and scholarly groups with digital tools as a framework enabling collaborative research that can then be published within the Library."

The Library is working to reshape its existing library services to support Information Communities and the expanded capabilities they will bring to instruction and research. The initial Information Communities will support the development of the initiatives recommended by the Virginia 2020 Commissions on the future of the University of Virginia. Our planning calls for developing the underlying structure and portals for up to five Information Communities in the coming year.

top

American Studies

The UVa Library holds one of the world's finest collections of rare books and manuscripts in American Literature and History, has been an international leader in digitizing documents and images from these collections, and hosts several innovative academic centers which are busy advancing this work. Our next major Information Community will be a cooperative endeavor across all the disciplines central to American Studies, gathering collections close at hand and building online scholarship with University faculty. When fully developed, the American Studies Information Community will bring together collections, students, and researchers from around the world, both online and onsite, reshaping the future of scholarship.

The Community initially will be based on existing local physical collections such as the Clifton Waller Barrett Library of American Literature, the Tracy W. McGregor Library of American History, and a host of other rare book, manuscript, and archive collections in the Special Collections Department.

A number of digital text and image collections and projects have already been created, such as:

There are also these map and dataset collections that will be valuable to the community:

We expect to partner with the University's Institute for Advanced Technology in the Humanities and the Virginia Center for Digital History, both located in Alderman Library, and work closely with allied departments and centers, including an Institute still the planning stages that is designed to bring American Studies scholars from around the world to Charlottesville for research and educational development.

Other possible early-stage collaborators include the Thomas Jefferson Foundation, the University's Bayly Art Museum, Virginia Tech , the Smithsonian Museum of American Art, SOLINET, and the Digital Library Federation. While we have no illusions that what we are creating will be the only portal into the world of American Studies, there is no reason at this point to set a limit to the numbers and range of eventual participant members of the Community. Collaborative expansion is one of the prime strategies for growth for each Information Community.

Phase I Proposal: Creating an American Studies Portal to an Integrated Collection

While an Information Community will grow according to the nurturing efforts of its participants, the first stage must be the gathering of collections. In order to investigate possibilities for harvesting metadata and using the resources that they describe we must first create an integrated collection of digital resources from which an American Studies community can be served. The whole concept of information communities depends on having a large integrated general collection upon which rule-based software "lenses" can be focused, giving a specific community its best access to the resources. We have long been creating and acquiring appropriate digital collections that should provide a firm foundation upon which to build. What has been missing until recently is the infrastructure that will make true integration of collections possible and the necessary provision of customized views practical.

We have built a working digital object repository based on the Flexible Extensible Digital Object Repository Architecture (FEDORA) protocol. This protocol, developed by Carl Lagoze and Sandy Payette at Cornell University, can be used to manage and deliver a broad variety of digital resources. For details of our architecture and implementation of the system, see our article "Virginia Dons FEDORA: A Prototype for a Digital Repository" in the July/August 2000 issue of DLIB magazine (http://www.dlib.org/dlib/july00/staples/07staples.html). Currently, we have 250,000 digital objects in the repository, including many electronic texts. Some of these are fully transcribed and marked up, and others are sets of page image objects bound together by a more minimal structural metadata object. We also have a collection of EAD-encoded finding aids, some of which contain references to etext image and text objects previously described.

In addition to the images of text pages, we have several collections of images in our testbed. These include documentary photographs from our special collections and art, archeology, and architecture images that we purchased from vendors. The purchased images came with databases that we have been experimenting with processing and reformatting into a set of XML objects that provide the structural metadata to organize access in the digital library. The General Descriptive Modeling Scheme (GDMS) XML DTD that we are using is one that we developed. It is intended to model collections of digital resources in ways that are natural to the collections: architecture and archeological site image collections are organized around the structure of the site; collections of art images are organized by creator or other classification in the case of unknown creators.

After our catalogers took the original databases and massaged them a bit, we processed the data into the XML form with a script. We are working towards a model where we can pre-process data from outside sources, apply the human judgment needed, then post-process it into a usable form. Using an XML format makes it easy to retain the source of the information, so we can refresh the metadata from the original source while retaining our added information. We are very interested in experimenting with pre-processing techniques that allow a human to make judgments about large groups of resources that can be applied mechanically, working towards supporting a "bionic cataloger" who can process large collections of resources efficiently by using a variety of software aids.

Plan of Work

Our proposed plan is to proceed on three fronts: expanding our digital library testbed, developing a set of tools and procedures for a American Studies community, and developing the portal for that community. It is critical to activities on all three fronts that we have a community coordinator to support the activities of the community. We see teams of library staff forming around communities to provide various expertise, as well as advisory groups of faculty and other community members, and know that we will need one person to act as the coordinator. In this startup phase we think that we need someone who is conversant both with the content and with at least some of the technical issues of digital content development and delivery. The activities of the grant project will be coordinated by the Digital Library Research and Development group, with Thornton Staples, the director of the group, acting as the principal investigator.

Expanding the Testbed

To date, our digital library activities have centered on building a testbed that demonstrates solutions to particular technical problems. Our next priority is to apply what we have learned from our testbed activities to bring in a large collection that can support an American Studies community view. We plan to bring our Early American Fiction and Modern English etext collections into the repository and, where necessary, enhance metadata to ensure they are recognized as specifically American. This involves converting the texts to XML and refining the XSL stylesheets and other scripts from our initial testbed. It also involves restricting access for some of the collection, which requires that we develop an authentication and rights management infrastructure. Our main focus for further repository system development for the next year will be to do this by adding policies to objects and enforcing them through the system.

Our entire collection of EAD finding aids was included as a testbed, so those will require minimal work on the metadata. Our image collections from Special Collections are currently organized using Filemaker Pro databases. We propose to create EAD finding aids for those collections, add to the metadata, and develop some new XSL stylesheets and scripts. We have a number of architectural image collections developed by faculty that we would like to consolidate, normalize and enhance the accompanying databases, and convert them to GDMS XML objects.

To build up the art side of our collection we would turn to some outside sources. We are members of AMICO and we have had preliminary conversations with both Jennifer Trant of AMICO and Ricky Erway of RLG about including that collection in our repository. We would convert the database to GDMS objects and register the images as formal objects in our repository that would point to the files resident on the RLG server. We would do the same for the collections data from the Bayly museum here at UVA. For both collections, this would mean that we were integrating a broad collection of art from around the world into our digital library from which the community could draw on for American studies.

We have also had conversations with Elizabeth Broun, at the Smithsonian American Art Museum (SAAM), and Rachel Allen, the director of the Research and Scholars Center at SAAM, about establishing a relationship with our American Studies community. From the digital collection point of view, that would include integrating their collection records as well as their Inventories of American Painting and Sculpture. The collections data would be handled in the same way as AMICO. The Inventory data would provide us with an interesting research opportunity to figure out how to integrate Z39.50 databases.

Developing Procedures and Collecting Tools

The second front for our work would be to take the first steps to develop procedures and tools to be used in developing a community view of our digital library collection. This requires: 1) developing a profile of a subject area that can be used to recognize a resource as related to the community in question and 2) testing methods of analyzing the resources and enriching the metadata for that community.

We would continue working on the "bionic cataloger" model that we have started using for our purchased resources. This work would concentrate on mechanically pre-analyzing data, presenting a relatively compact summary to a human for judgment, then feeding the results back to enhance the original data, storing the final results in our repository such that the original metadata record could be updated from its source while retaining our additional data. We definitely plan to provide some simple tools that the cataloger could use to do this, based on sorting and boiling down data so that judgment could be made and applied to large sets of metadata.

One possibility that we will explore is to apply natural language expert systems software to the problem. The Alembic Workbench software ("http://www.mitre.org/technology/alembic-workbench/") developed by the Mitre Corporation (and available to be used for free for internal purposes) appears to provide the basic functionality that we would need. Essentially, the system uses natural language processing to analyze textual data and build a knowledge base. A human user "trains" the system in steps by giving feedback, which is then used to enhance the knowledge base. We would like to exploit such a system to develop the profiles for each community to identify resources with a high probability of relevance. We will then add another profile for each community that could be used to enhance the metadata and increase its usefulness.

The bionic cataloger will be an important tool for continuing our work with enhancing the databases that accompany the digital collections that we buy. But it will be even more interesting to use these techniques on the pool of metadata being developed by the Open Archives Initiative (OAI). The OAI will finally give us a systematic way of locating publicly available digital resources, and we believe that our system will give us a way to integrate those resources and make good use of them in a manageable, scalable process. In this manner, we expect to be able to establish access to a large collection of publicly available electronic texts that we assume will be made available through OAI metadata from other Digital Library Federation members. We should also be able to tap into a rich lode of information about the current location of museum objects, and in some cases digital representations thereof, by harvesting the metadata that will be made available by the Consortium for the Interchange of Museum Information. In both of these cases, the harvested data would provide a very interesting application for our bionic cataloging system to recognize and classify resources that are of interest to an American Studies community.

top

Building the Portal

The third area that our work will concentrate on is building a portal for the community. We see a web-based portal as being the primary tool that the Library can use to host each of many information communities. Each community will have communication services, access to relevant collections and tools, and reference services provided through a portal. American Studies, by its very nature, is a decentralized field.

Much of the work in this area will center on making the portal as manageable as possible for the community coordinator. We plan to use a portal software package as the basis for our work, but we will need to build processes and databases that help automate some of the community services. For example, we have been experimenting with an XML database of bibliographic objects that represent on-line reference resources that can be used to give the most up-to-date list when people go to our reference page on the web. We will explore associating those reference resources with a community, making it not only possible to provide community specific reference pages automatically, but to feature new references in a special section that automatically updates after a certain time period.

Some of the most important services that we would like to develop around the portal are those that support collaborative work of various kinds. Members of the community would be able to log in and be given the appropriate access and tools for adding objects to the collection, adding to metadata, posting notices, etc. We would like to provide specialized tools for teaching and research that are relevant to the member of the community. We envision the portal as a primary conduit for sharing resources among scholars from various departments, here at Virginia, and among members of the Information Community worldwide.

 

American Studies Information Community
Digital Access Services
University of Virginia Library
P.O. Box 400112
Charlottesville, VA 22904-4112

Information Communities  Digital Initiatives
UVa Library Home  Search UVa Library Site
Maintained by: infocomm@virginia.edu
Last Modified: Tuesday, April 08, 2003
© The Rector and Visitors of the University of Virginia