IASSIST 2004 Conference, May 25-28

 

 

 

Abstracts for Thursday, 27 May 2004

Plenary
Building on 30 Years of Data Advocacy: Perspectives from Past Presidents
Chair: Ron Nakao

Alice Robbin (Indiana University Bloomington)
Ann Gray for Judith Rowe (Princeton University)
Thomas E. Brown (National Archives and Records Administration)
Charles Humphrey (University of Alberta)
Peter Burnhill (EDINA National Data Centre and University Data Library)
Ann Green (Yale University)

Throughout its history, IASSIST has been a focal point of data, its quality, its preservation, and its accessibility. This panel of past Presidents will share their insights and experiences in a discussion of the history of IASSIST's role as advocate in the world of information services, data resources, and technology over the last 3 decades. Each panelist will present a few points about "advocacy" as initiated and/or implemented during his or her presidency. Looking to future directions, Ann Green, current IASSIST President, will comment on IASSIST's continued role of advocacy as outlined in the 2004 Strategic Plan. This session will present a unique and historic opportunity to learn a bit about IASSIST, past and present, from those who played a leading role in shaping the evolution of our association.


D1: When Metadata Standards Meet: Issues of Language and Interoperability
Chair: Jen Green

Can DDI Records Be Accurately Transformed to "Catalog-ready" MARC 21 Format?
Harrison Dekker (University of California, Berkeley)

One of the side effects of the increasingly digital nature of library collections is the "hidden resource" problem. As collections become more "virtual," traditional approaches to cataloging, for a variety of reasons, often fall short. As a result, it becomes hard, if not impossible, for users to locate these materials. At UC Berkeley, numerical data is one such hidden resource. A recent review of the numerical data holdings in UC Berkeley Library catalog revealed that much of the library's data holdings were either inaccurately or not cataloged. Given the importance of numerical data sets in teaching and research, a solution was sought to redress these issues. Because of the scope and importance of the ICPSR data collection, it was given priority. After determining that a complete set of catalog records was not available, a decision was made to investigate whether ICPSR's freely available DDI-compliant XML metadata could be efficiently transformed to catalog-quality MARC21 records. In this presentation, I'll discuss the outcome of the project, the technical details of the conversion process, and the problems encountered along the way.


Laying the Groundwork for Addressing Interoperability Issues between Geo-spatial Metadata Standards, the DDI and Dublin Core
Tony Mathys (EDINA National Data Centre and University Data Library, Edinburgh)
[Presented by Kenneth Miller (UK Data Archive, University of Essex)]

Recent approval of the ISO 19115 Geographic Information Metadata standard offers an opportunity to assess the relationship between geo-spatial and social science portals in terms of interoperability. Numerous social science datasets hold a geo-spatial component and measures are to be discussed and introduced over time to assure that these datasets can be discovered through co-ordinate-based queries. Furthermore, the social sciences and geo-spatial technologies need to come together to assure that a common element set is considered or measures are taken to support cross-searches between geo-spatial and social science portals.

These are the challenges that have come to light during activities associated with the MADIERA project and the joint UK Data Archive (University of Essex) and EDINA (University of Edinburgh) geo-portal project. The MADIERA project is directed at providing a common integrated interface to the resources of the majority of the existing social science data archives in Europe. The geo-portal project is intended to provide a geo-data portal to serve as a resource discovery tool for the UK academic geo-spatial community.


Implementing an ISO/IEC 11179-3 Metadata Repository for Labour Market Data: Building Semantics through Data Structures
Rob Grim (Institute for Labour Studies, Tilburg University)
Jeroen Hoppenbrouwers (Institute for Labour Studies, Tilburg University)

The increasing demand for documentation of the workflow to keep track of large amounts of statistical tables and international comparative research urges the Institute for Labour Studies (ILS) to implement a metadata repository. The ISO/IEC 11179-3 standard offers explicit guidelines for developing metadata-registries. One of the core fundamentals for an ISO/IEC 11179-3 metadata repository is the separation of a conceptual layer from a data representation layer. The paper shows the experiences of the ILS with implementing the necessary data structures for setting up a registry for labour market data. The mapping of data element concepts to conceptual domains and data elements using a concept browser is illustrated. Further it is shown how the concept browser facilitates the management and navigation of knowledge domains in labour market research.


D2: Privacy, Security, and Information Today
Chair: Margo Anderson

Internet Surveillance: Recent U.S. Developments
Juri Stratford (University of California, Davis)


The U.S. Federal government has recently implemented both technologies and policies related to Internet surveillance. This paper looks at recent U.S. developments, including the Federal Bureau of Investigation's Carnivore software, new authorities relating to electronic evidence under the Patriot Act, and the Pentagon's Total Information Awareness Program.


An Empirical Examination of the Concern for Information Privacy Construct in the New Zealand Context
Ellen Rose (Institute of Information and Mathematical Sciences, Massey University)

Moore stated "since societies differ, the desire or need for privacy will vary historically, from one society to another and among different groups in the same society." This study uses confirmatory factor analysis on a random sample of 459 New Zealanders to further examine the structure of the recently developed Concern for Information Privacy (CFIP) construct in a post September 11 environment in a similar western society that has a different regulatory model with respect to protecting the privacy of personal information. Similar findings on CFIP's dimensions and its treatment as a second-order factor strengthen the findings of previous empirical tests of the CFIP instrument developed by Smith, et al. since the sample demographics and the time of data collection differ. In addition, theoretical relationships between CFIP, consumer knowledge of current policy, regulatory preferences, negative experiences with private and government organizations, and different situations under which information might be revealed were examined with the results showing some interesting differences. The New Zealand regulatory model is a middle ground between the strict directives of the European Union and the self-regulatory environment of the United States, making it an interesting context to study in the interest of contributing to balancing the needs of society, individuals and international trade with respect to privacy of personal information.


Data Archives in the Post 9/11 World
Thomas E. Brown (National Archives and Records Administration)

A key weapon in the war on terrorism is information. The information in data archives around the world is no exception. This presentation will explore how the U.S. National Archives is changing its access policies to the databases in its holdings that have become "records of concern." This includes evolving guidelines to identify those databases that need to be restricted. After concluding that certain databases may be records of concern, the Archives is limiting access to records previously available. But in the effort to make some information available, it is also trying to use techniques previously developed for protecting confidentiality of individuals to grant limited access to these databases of concern.


D3: Ensuring Data Quality: Aim High
Chair: Luuk Schreven

Elementary Data Quality Elements
Karsten Boye Rasmussen (University of Southern Denmark)

Data quality is obviously a good thing and an attractive goal to pursue. But what is data quality? The paper will give an overview of the literature on data quality and present the intuitive, the empirical and the ontological approaches that lead to a focus on dimensions or elements of data quality.

The context of the paper is data for use in the data warehouse. The proposition is that data quality is not a static measure and that although data should not be changed by the users of the data, the users' use of the data can build information for a context or metadata. The proposition is that the improved metadata dynamically can improve the data quality even though data are "frozen."


Meaning and Illusion in US Economic Statistics: A Case for Education and Restricted Access to Federal Statistical Microdata on Organizations
Martin David (University of Wisconsin - Madison)

Economic indicators are cited and analyzed by persons who know little of their accuracy or meaning. Net change in employment, percent change in GDP and productivity, and the level of Federal budget surplus evoke comment and action inconsistent with uncertainty in these estimates and their imperfect links to well-being, growth, and health of the economy. I present paradoxes in the meaning of these indicators and demonstrate gaps in users' understanding of underlying measurements.

Closing the gaps entails three efforts. 1) Data disseminators and archivists need to develop training modules and check lists to guide uninitiated users and stimulate questioning about epistemology. 2) Academics training professional economists and statisticians must increase training on measurement of economic activities. 3) Research access to statistical microdata archives on organizations must be substantially increased. That access entails increased documentation and reduced cost for scientific investigation of those microdata.

I explain how these thoughts led to the creation of the program of studies on economic statistics that I created for the Joint Program in Survey Methodology (University of Maryland, University of Michigan, and WESTAT). Widespread understanding of the meaning of economic indicators will increase productivity and relevance of research on those indicators.


Missing Data Allocation in the IPUMS: Minnesota Allocation Techniques and Customizable Tools for Researchers
Colin Davis (Minnesota Population Center)

The IPUMS (Integrated Public Use Microdata Series) software takes public use samples of census or survey microdata and, along with harmonizing variable categories, corrects logical inconsistencies and missing values. The U.S. Census Bureau has released public use samples for 1940 to the present in which missing values have been allocated and logical inconsistencies have been corrected. In contrast, historical samples of the U.S. Census (1850 through 1920) created by the Minnesota Population Center, as well as many modern international samples, must undergo missing data allocation to correct logical inconsistencies and missing values. The Minnesota Population Center has developed a second generation of data conversion software to produce all IPUMS data, including missing data allocation.

The original software allocated missing data in the U.S. samples 1850-1920. Our second generation software had to do the same, and also add as much extensibility as possible in order to accommodate future microdata projects. To this end, the new data conversion program interprets an "allocation table definition" that describes tables for a hot-deck donation and allocation procedure. This presentation will describe the technology and procedures used to allocate missing data at the MPC, including a demonstration of software that allows researchers to customize missing data allocation rules as desired.


E1: DDI in Practice
Chair: Jostein Ryssevik

Developing the DDI and Its Applications in Taiwan
Alfred Ko-wei Hu (Center for Survey Research, Academia Sinica)

The Data Documentation Initiative is an important infrastructure, and step as well, toward building a web-oriented data archive. Yet the preparation of a DDI codebook and the development of DDI-related web applications produce new challenges to the data archive formerly based mainly on standalone PCs as the primary medium for data storage and daily operation. In this paper, the DDI experience at the Center of Survey Research at Academia Sinica in Taipei will be studied. The issues to be addressed in this paper include the following: 1) the problems in creating a DDI codebook, 2) the development of related tools used for processing the DDI codebook, 3) the relationship between DDI and relational database, and 4) the development of web applications in relation to DDI. While the Center of Survey Research at Academia Sinica in Taiwan is a young and small-sized data archive by international standards, it is hoped that its experience in the DDI project can shed light on the future development of the DDI and its add-on tools.


Cataloguing Individual Data Values within an On-line Visualisation System Using the DDI Aggregate Data Extension: The New Great Britain Historical GIS
Humphrey Southall (Great Britain Historical GIS Project, University of Portsmouth)

The Great Britain Historical GIS Project makes British historical statistics widely available, especially census data for a local history audience. Much data has been computerised or assembled from collaborators, but until recently was held as many separate tables structured like the paper originals; like most archives, it was a library of datasets, not of data. A new architecture has been developed in which all statistical data are held in one column of one table, with millions of rows. Other columns contextualise data values via links to three metadata sub-systems. Location in time and space are recorded via a systematic gazetteer, based on the Alexandria Digital Library Gazetteer Content Standard and previously presented at IASSIST. The Source Documentation System links data values to the census reports they came from, enabling reassembly of the original tables. The Data Documentation System is based on the DDI Aggregate/Tabular Data Extension and plays a more interpretative role, enabling comparisons over time and defining new derived values.


A DTD for Qualitative Data: Extending the DDI to Mark-up the Content of Non-numeric Data
Louise Corti (UK Data Archive, University of Essex)
Libby Bishop (UK Data Archive, University of Essex)

In this paper we present a set of recommended elements (tags) that might enable the DDI to be extended to the description of the structure and content of qualitative social science data. The DDI is appropriate for describing study, file and variable level information for qualitative datasets, but TEI-like headers are also required to enable XML-based data exploration.

ESDS Qualidata has identified a growing need for a standard framework (for data and content-level metadata) for facilitating the sharing, presentation and exchange of digital qualitative data via the web. To this end we have already developed a basic prototype methodology using XML standards and technologies. Recent work has focused on specifying a general and formal application for encoding, searching and retrieving the content of a broad class of social science data resources. Work in progress has been to formulate a recommended set of guidelines for preparing and marking-up data to a common and minimum recommended XML-based standard, for data providers/publishers to publish to online data systems, such as ESDS Qualidata Online, and software companies who currently offer qualitative data analysis software to consider with data exchange in mind.


No Longer Lost in Translation
Kenneth Miller (UK Data Archive, University of Essex)

As part of the MADIERA project (Multilingual Access to Data Infrastructures of the European Research Area), the development of an eight language multi-lingual thesaurus has continued. This paper highlights the changes made within the NESSTAR publisher to make the tasks of assigning index terms from this thesaurus at study, variable group and variable level to DDI marked-up metadata both consistent and less resource intensive. The ability to easily add high quality data content to the new MADIERA system has been given greatest priority in this project, so that the eventual end-user features can be demonstrated to their best advantage. It is hoped that a prototype user interface, exploiting the power of the thesaurus, will be available in time for the IASSIST conference.


E2: Developing Statistical Literacy: Think Locally, Work Globally
Chair: Robin Rice

Do It Yourselves: A Peer-to-peer Approach to Professional Training
Wendy Watkins (Carleton University)

Information flowing from government sources is so voluminous and is disseminated in such a variety of formats that information professionals are under constant pressure to keep pace. To complicate matters, access to these resources is increasingly being driven by changes in communication policies and computing technology. The information professional is often responsible for staying current with new formats and methods of access, thus necessitating new approaches to training and learning on the job.

This paper examines the training strategy developed in response to the Data Liberation Initiative (DLI), which is a cooperative effort between Statistics Canada and post-secondary institutions in Canada. DLI provides access to a large volume of quantitative and spatial data through university libraries and is implemented and supported locally by academic librarians. We will discuss the use of peer instruction and the training principles employed to upgrade the skills of those called upon to provide these DLI-related services. The experience of Canada's Data Liberation Initiative illustrates the value of peer-to-peer training in building a national baseline level of service skills for a specific collection.

This presentation is adapted from a paper presented to the 69th IFLA Conference, Berlin, August 2003. Original authors are Ernie Boyko, Statistics Canada, Elizabeth Hamilton, UNB, Chuck Humphrey, UofAlberta and Wendy Watkins, Carleton University.


Data Librarians/Archivists Should Teach Statistical Literacy as Part of Information Literacy
Milo Schield (W. M. Keck Statistical Literacy Project, Augsburg College)

Students need to be information literate. Yet if students are to evaluate information competently, they must be able to evaluate arguments using statistics as evidence; they must be statistically literate. Although statistical literacy is a popular idea, no discipline has taken responsibility for teaching such a course. This paper argues that data librarians and data archivists should take responsibility for teaching statistical literacy as a part of information literacy. Reasons are given to support this claim. This paper relates statistical literacy to information literacy, critical thinking, quantitative literacy, traditional statistics and information management. Using data professionals to teach statistical literacy is argued to be an efficient use of academic resources to achieve a mission-critical goal.  


Understanding and Using Data: A Discussion of the Jargon and Trends in "Quantitative Literacy"
Paula Lackie (Carleton College)

This paper will provide an overview of the jargon used in the US related to quantitative literacy (i.e., numeracy, statistical literacy, spatial reasoning, etc.), as well as an overview of the various tracks institutions in the US have taken to address perceived deficits in "statistical literacy."

I will also look to the audience to fill in the conversation from the perspective of our diverse membership. What can we do to facilitate communication across educational systems? What can we do to support programs already underway? Please bring your questions and/or examples from your home institutions or countries and let's work to fill in a matrix of what's happening in this important area of education.

A blog on this topic has been started to facilitate continued communication during and after the conference. Please watch the IASSIST mailing list for the URL or write to plackie@carleton.edu.


      IASSIST Logo
         IASSIST Home

 

 

    DISC Logo
        DPLS Home

 

University of Wisconsin-Madison Logo
UW-Madison Home