IASSIST 2004 Conference, May 25-28

 

 

 

Abstracts for Friday, 28 May 2004

Plenary
Data in the Dairyland
Chair: Janet Eisenhauer Smith

Jeremy Freese (Department of Sociology, University of Wisconsin-Madison)
Larry Bumpass (National Survey of Families and Households, University of Wisconsin-Madison)
Jim Sweet (National Survey of Families and Households, University of Wisconsin-Madison)
Alberto Palloni (Health, Well-being and Aging in Latin America and the Caribbean, University of Wisconsin-Madison)

Scientific progress, public discourse, and good democracy are all well served by open equitable access to high quality microdata. Microdata is of "good quality" if it protects the identity and privacy of study participants and if it can provide valid statistical estimates of population characteristics. The data distribution system, on the other hand, is useful and of good quality if it provides equitable access to end-users that is appropriate to their level of computational sophistication, their knowledge of the topic, and their information needs. At the University of Wisconsin-Madison (UW), we have a long and exemplary history of creating high quality data in the social sciences and in preserving, documenting and distributing data from large complex comparative and longitudinal studies including the Wisconsin Longitudinal Study (WLS), the National Survey of Families and Households (NSFH), the Study of Health, Well-being and Aging in Latin America and the Caribbean (SABE), and the study of Puerto Rican Elderly Health Conditions (PREHCO), in addition to many other smaller studies. This session will provide an introduction to these studies including a discussion of study objectives, sample design, and survey instruments, and a description of the kinds of research for which the data are particularly well suited. Data distribution systems will also be discussed including the availability of public-use and restricted-access files and the conditions under which the data are available to libraries and researchers at other institutions.


F1: Building an International Network of Asian Social Science Research Data
Chair: Mary J. Lee

Building an International Data Network for China Studies
Shuming Bao (China Data Center, University of Michigan)
Karl Longstreth (China Data Center, University of Michigan)

This presentation will demonstrate the China data network project at China Data Center of the University of Michigan. Issues will include the internationally collaborative data development, copyright and data licensing, data service models, a sustainable international data network for data deployment and support, and the integration of the data center functions with teaching and research.


The Development of a Survey Data Archive in Taiwan
Alfred Ko-wei Hu (Center for Survey Research, Academia Sinica)

While survey research has a relatively long period of history in Taiwan since the late 1950s, the efforts in acquiring, maintaining and disseminating survey data in systematic ways are quite new to the social science community in Taiwan. The Center for Survey Research at Academia Sinica was established in 1994 as the most important, and the largest, national data provider for academic and quantitative research in Taiwan. In this paper, the discussion will be divided into three parts. The first part is to review the development of survey data archives in Taiwan with specific focus on the Center for Survey Research at Academia Sinica. The second part is to introduce the contents of data holdings and to describe how survey data are processed, preserved, and released to the general public in the Center for Survey Research in Taiwan. The last section in this paper will discuss the current status and future development in web applications.


F2: Helping Increase Statistical Literacy at Universities: Some Perspectives
Chair: S. Vincent Gray

The Challenges of Integrating Data Literacy into the Curriculum in an Undergraduate Institution
Karen Hunt (University of Winnipeg)

The successful University of Winnipeg Information Literacy program operates on the premise that students develop information literacy skills and knowledge best when opportunities for learning are integrated into the subject curriculum. This paper will discuss the results of attempting to integrate data literacy into the subject curriculum in the same way. While attempting to discover what are the best practices for developing data literacy, what can be applied from the information literacy field? What is unique to learning how to discover, manipulate and interpret numeric data?


A Model for Providing Statistical Consulting Services in a University Library Setting
Daniel Edelstein (Princeton University)
Kristi Thompson (Princeton University)

Princeton University's Data and Statistical Services consultants provide both computing and statistical consulting services to users of electronic data. This paper deals only with statistical consulting, and presents the model we have evolved to serve patrons at widely varying levels of statistical literacy. The service model is in many ways similar to that of traditional library reference service, but we have adapted it to meet the unique challenges of statistical consulting. Our service fills a major gap in the way academic statistics is taught-typically a highly mathematical and/or theoretical approach that leaves students ill-prepared to usefully analyze actual data. Our role is not to teach formal statistics-we don't help them derive proofs-but to give them just the statistical knowledge they need to use the data resources provided by the library. Much like in a traditional academic library reference interview, we are trying to help our patrons find the answer to a particular research question. Our service consists of helping our patrons answer the intermediate statistical questions that arise on the way to that goal.

In addition to describing our service model, we will enliven the paper with numerous (often humorous) examples drawn from actual consulting sessions. We hope to stimulate discussion with other consultants about our and alternative approaches.


Filling the Gap: Doing Stats in the Library
Susan Czarnocki (McGill University)
Anastassia Khouri (McGill University)

The personal computer and the world-wide web have meant that, within a university setting, data and tools to manipulate it can be almost ubiquitous. Unfortunately the skills required for this manipulation are not. The establishment of the Electronic Data Resources Service (EDRS) at McGill University in 1997, meant that students and faculty were now contacting a service housed in the Library to obtain electronic numeric data. With such ease of accessibility, it was possible for professors of undergraduate courses to contemplate including the manipulation of data as part of their course work. But there is little provision for assisting the students in such courses who are not computer literate, in using the data and software both correctly and efficiently. When the Libraries were re-assigned to report, not to the Vice-Principal Academic, but to the Vice-Principal IT, the Director of Libraries began to accept an additional role for Library Services: that of a data specialist with experience in social research to be part of the EDRS.


F3: Facilitating Data Access and Analysis
Chair: Bo Wandschneider

Delivering the World: The Establishment of an International Data Service
Susan Noble (MIMAS, Manchester Computing, University of Manchester)

In this paper we describe ESDS International, a new data service providing access to the major socio-economic databanks produced by international governmental organisations such as the World Bank and United Nations. Through the new service, these important databanks are delivered over the web, free at the point of access to the UK academic community. The paper discusses the principles behind the service, the data acquisition strategy and the establishment of licensing agreements with the data providers. The delivery software and the development of a user interface are described and we report on the challenges of converting large and complex datasets from a range of sources into a single user-friendly format. In addition to the data delivery, a pilot web-based data exploration and visualisation interface has been developed to encourage the use of the data in learning and teaching. Finally, the paper outlines the strategies and value added services employed to promote the use of these previously under-utilised databanks across a broad range of social science disciplines.

Other contributors to this work are Keith Cole, Celia Russell, James Schumm, and Nick Syrotiuk.


Integrated Online Analysis: Evaluating NESSTAR and SDA
Marc Maynard (The Roper Center for Public Opinion Research, University of Connecticut)

Online analysis of survey data files has been of significant interest to the Roper Center for a number of years. Integrating a data analysis system with existing finding aids would be of tremendous value to a wide variety of researchers. Dedicating resources to such an effort requires an evaluation of appropriate alternatives. This paper will present an evaluation of two current data analysis systems: NESSTAR and Survey Data Analysis (SDA). While not exhaustive in scope, this evaluation will focus on criteria pertaining to the Center's desire to integrate an exploratory analysis system with the iPOLL public opinion question databank. Evaluation criteria will include preparation of system files, system maintenance, ease of integration, performance issues and presentation features, among others.


Responding to Digital Data Needs: The DEWI System
Ron Nakao (Stanford University Libraries)
Chris Bourg (Stanford University Libraries)

Although data has long been an important element of social science research and instruction, the nature of social science data needs has changed dramatically in recent years. A major trend is the dramatic increase in demand for data by undergraduates for use in their own research. The number of courses that include data intensive assignments has also increased. In addition, researchers and librarians alike are recognizing the need to create electronic archives of available data.

The Data Extraction Web Interface (DEWI) System is a suite of tools for the processing, preservation, and delivery of Stanford's social science numeric data collection that connects with the existing array of computing and software resources available at Stanford. DEWI provides an integrated point of service for data users, by allowing users to browse lists of variables, search for variables, and create custom subsets of data which can be downloaded to personal computers in a variety of formats compatible with popular statistical software. In this presentation, we will describe the development of DEWI, discuss how DEWI has been used within the Stanford community, and discuss some of the directions that we are exploring in the future development of DEWI.


G1: New Avenues for Data Dissemination
Chair: Marc Maynard

The Dutch Social Science Question Bank
Marion Wittenberg (NIWI / Steinmetz Archive)
Helga van Gelder (NIWI / Steinmetz Archive)

The Dutch Question Bank is a project in which NIWI / Steinmetz Archive wants to establish a databank with question wordings from major studies in the Netherlands. Since the beginning of the 1960s the Steinmetz Archive collects social science datasets, in order to make them available to social scientists. Making the research instruments available, by which these data are collected, was never seen as a core business. Unlike many other archives Steinmetz Archive did not make full-scale codebooks on a regular basis, in which these research instruments were incorporated. Nowadays the questionnaires are available through the Steinmetz Archive website in PDF-format, but they are not searchable. With the Question Bank project we want to research in which way such a service can be best developed without retyping the questionnaires. At the moment we are building a pilot website on which identical questionnaires are published via three different prototypes. We are planning to organize discussion groups with Dutch social scientists in which we will evaluate the different systems. In our presentation we want to sketch our first experiences with the development of the three different prototypes.


Grid Technologies for Social Science: The SAMD Project
Celia Russell (MIMAS, Manchester Computing, University of Manchester)

The Seamless Access to Multiple Datasets (SAMD) project is designed to demonstrate the benefits of Grid (e-Science) technologies for dataset manipulation and analyses in a social science context. Grid technologies run over existing internet infrastructures and offer a faster alternative to the world wide web for the transfer and analysis of large datasets. Under the SAMD project, a web-delivered social science dataset was made available for large-scale data analysis through a Grid architecture. Using an exemplar problem drawn from the UK social science community, the project demonstrates how the integration of a single sign-on environment, Grid technologies and access to high performance computational resources can significantly speed up computationally intensive queries and streamline data gathering and analysis. The approach used can be generalised to virtually any kind of problem involving data retrieval and analysis, and the paper also discusses how this could allow social scientists to significantly scale up their quantitative research questions.

Other contributors to this work are Keith Cole, M. A.S. Jones, S.M. Pickles, M. Riding, K. Roy, and M. Sensier.


MADIERA: A European Infrastructure for Web-based Data Dissemination: An Overview
Atle Alvheim (Norwegian Social Science Data Services)

MADIERA (Multilingual Access to Data Infrastructures of the European Research Area) is a EU-funded project. The consortium consisting of eight European partners aims at establishing a web portal for social science data, based on the DDI and extensions to the existing Nesstar technology. New features include tools for multilingual support, logic for identifying comparable datasets, a system for geo-referencing of datasets, options for users to add their comments to datasets, links to scientific reports, etc. Within November 2005 the project will establish a web portal where datasets from all the European Social Science Data Archives will be present. Furthermore, the aim is to extend the portal beyond this group of data providers. This presentation will provide a general introduction to the project, focusing in particular on practical problems of integrating data across several national archives, limits of the DDI, politics of data access, harmonising categories, etc. For more information see www.madiera.net.


G2: Three Studies with Numeric and Geospatial Data in Asia - the Case of China, Vietnam and Korea
Chair: Lu Chou

Historical Geodata for Pre-Modern China - A Case Study of the CHGIS Project
Merrick Lex Berman (China Historical Geographic Information System, Harvard Yenching Institute)

The China Historical GIS (CHGIS) has been developing a base GIS framework of all the recorded administrative divisions for dynastic China, from the unification of the first Chinese Empire (222 BCE) to the fall of the last Dynasty (1911 CE). The CHGIS project is not producing or incorporating historical statistics for these administrative units, but is specifically focused on the more fundamental matter of compiling all administrative units into a single geospatial database. Each unique historical unit is defined with: a date range, a place name, a feature type, a source citation, a relationship to its parent jurisdiction, and an associated spatial object in GIS.

The CHGIS project developed a relational data model for keeping track of historical places and documented sources as they changed over time. Many technical hurdles and system integration issues had to be dealt with, including: developing a search engine for the Web with guide maps, defining spatial objects for ancient places, system integration of multilingual datasets, and testing semantic interoperability between feature type thesauri.

We hope that our experiences and the CHGIS datasets themselves will be of interest to everyone dealing with digital sources of historical geographic information. We also welcome collaboration in the development of application methods that can be used together with our base GIS framework.


Report on the Recent Stay as a Fulbright Scholar in Vietnam
Daniel Tsang (University of California-Irvine)

As Vietnam seeks membership in the World Trade Organization, many studies have been conducted of its economy and society. I will report on my recent stay in the country as a Fulbright scholar researching social science data collections and their availability, and national efforts to improve its statistical infrastructure.


Quantitative and Geospatial Social Science Data in Korea
Mary J. Lee (Laboratory for Social Research, University of Notre Dame)

This presentation will examine the infrastructure and feasibility of the Korean quantitative and geospatial data.


G3: Enhancing the Research Experience for Data Users
Chair: Ann Gray

Pointers for Secondary Analysis of Public Opinion Data
Lois Timms-Ferrara (The Roper Center for Public Opinion Research, University of Connecticut)

Polling data are everywhere. During a presidential election in the United States polls take on a life of their own. For a full 10 months each daily newspaper cites at least one new survey. How do you tell good polls from bad? What are some of the analytical tools that need to be considered of when examining the mountains of available data? How does one design a research question? These questions can confuse anyone. How can we as social science information professionals help?

This paper explores some "suggestions" for doing sound secondary opinion research. From the basic questions of sampling, error, and reading tables, to the more sophisticated concerns of question and data interpretation and statistical tests, this presentation will provide helpful pointers to assist the novice and seasoned researcher. The paper will call attention to various sources of assistance for data exploration, locating relevant information, assessing its value, and presenting the information in a clear and concise manner.

Using Roper Center data and metadata, this presentation will offer illustrated examples of how to best utilize Center resources and other sources for the secondary analysis of polls.


Reconceptualizing Statistical Abstracts in the 21st Century: An Empirical Study of the Sourcebook of Criminal Justice Statistics
Carol Hert (Syracuse University)
Lydia Harris (University of Washington)

Statistical abstracts have always formed a core source of statistical information for a wide variety of users. The increasing technological capabilities of online media has led to an interest in understanding how statistical abstracts might be adapted or transformed in the age of the Web.

This paper reports on a study in which the Delphi technique was used to develop a consensus among a set of experts on the future of one particular abstract: Sourcebook of Criminal Justice Statistics. Participant input was used to generate a mission statement for the Sourcebook and a prioritized list of requirements for accomplishing the mission.

The findings indicate a continuing role for the statistical abstract but one that can better utilize technologies to create more personalized statistical displays as well as enhanced access to additional sources.

Acknowledgements: This study was funded by the United States Bureau of Justice Statistics.


Research in ICTs and Political Behavior: What We Know and Don't Know About Technology and Political Life
Alice Robbin (Indiana University Bloomington)
Christina Courtright (Indiana University Bloomington)
Leah Davis (Indiana University Bloomington)

A wide range of studies in political science, sociology, communication, cognitive and social psychology, library and information science, and other disciplines have been conducted for more than half a century on various aspects of user behavior related to the use of new technologies. This research has led to significant progress in the technical and computational aspects of information storage, retrieval and use, the development of a global information infrastructure, and the usability of products and services. More recently, research has focused on the role of information and communication technologies (ICTs) and political life. This paper evaluates the status of this empirical research. We find that, with notable exceptions, research on e-government, e-governance, and e-democracy has not avoided and, indeed, struggles with well-known conceptual, theoretical and methodological problems that contribute to a lack of robust empirical evidence to support claims that political life has been altered by ICTs. It may well be that some of these problems defy solutions; however, we remain optimistic and conclude by offering suggestions for improving the quality of research data on the relationship between ICTs and political behavior.



      IASSIST Logo
         IASSIST Home

 

 

    DISC Logo
        DPLS Home

 

University of Wisconsin-Madison Logo
UW-Madison Home