![]() |
|
Editor: Joanne Juhnke, Special Librarian April 2008
European Data Center for Work and Welfare (EDACwowe) Brave New World of Synthetic, Organic, and Sonic-Optic Data Mention quantitative analysis and it will take most readers of this Newsletter little stretch of the imagination to envision a numeric data file. Most social science microdata follows the classic pattern: individual responses to survey questions, coded for statistical analysis. For decades social scientists have reaped the social benefits of analyzing public use microdata files for insights into social behavior. Over the years, however, the boundaries of traditional social science data have stretched in various ways. With rapid advances in technology and online access to public data, there has been an increase in the potential for respondent confidentiality to be compromised. As February’s DISC News highlighted, restricting access to data has been an important approach in protecting confidentiality. Another solution to this problem has been to modify the data itself. Microdata files for public use are routinely modified in order to mask individual respondents’ identities, often with unfortunate implications for the statistical quality of the data. Synthetic data is a new approach that generates multiple subsamples from the original survey data, protecting confidentiality while simultaneously preserving critical statistical properties of the data: mean, variance, and covariance. The data is inference-valid, but impossible to link back to an individual. The SIPP Synthetic Beta file (http://www.sipp.census.gov/sipp/synth_data.html) is the first of its kind to be released by the Census Bureau, created by integrating data from the Survey of Program Participation (SIPP), Social Security Administration (SSA), and Internal Revenue Service (IRS). Birth date, death date, marital history, and immigrant status are among the synthesized variables in this collaboratively-produced dataset, designed to reproduce the characteristics of the underlying confidential microdata. At another point on the confidentiality spectrum one finds biosocial data, population-based sample surveys that combine demographic, social, and behavioral data with biological indicators. Most biosocial data refers to markers long associated with health surveys such as grip strength, pulmonary functioning, blood pressure, heart rate variability, weight and height, perceived age, clinical measurements of various substances in the blood, saliva, or urine, and various other measures of risk factors, exposures, and health outcomes which social scientists can use to better estimate environmental and behavioral effects on health. One example of such data collection is the National Social Life, Health, and Aging Project (NSHAP), an in-home survey of 3,005 persons aged 57 to 84 that collected biomeasures of health and physiological functioning to better characterize the health of survey participants. Height, weight, saliva sample, and distance vision are some of the biomarkers collected (http://biomarkers.uchicago.edu/timing.html). NSHAP data is available through ICPSR. When DNA is collected, genetic variations become variables, inevitably leading to questions such as whether or not genetic indicators should be analyzed in the context of understanding complex social traits. At what point do sociology and behavioral/natural science overlap? Are human behavior and culture a product of natural selection? In yet another twist on social science data, what if the variables in the datasets weren’t limited to the survey instrument? What if researchers could create their own variables from raw video footage? DISC staff members Lu Chou and Cindy Severt recently attended a Research Computing Workshop on TRANSANA, a software product for analyzing digital video or audio data. Developed by a graduate student and maintained at the Wisconsin Center for Education Research (http://www.transana.org/download/index.htm), TRANSANA works by coding events in video clips and linking those events to an audio transcript. To illustrate, a researcher studying how children learn in a classroom setting might notice recurring episodes of laughter during a 60 minute video. These moments can be tagged to a transcript and selected, not unlike selecting variables by column location. Laughter, silence, activity, or even facial expression can be “extracted” and exported as a tab-delimited file for analyzing with statistical software. What happens just prior to and after the laughter might be equally important, and TRANSANA offers a means of leveraging the richness of that context. Sociology is an evolving field, and continues to cross-discipline itself into new sub-fields. Whether questionnaire-derived, artificial, biological, or qualitative: in the words of Vincent van Gogh, “there is nothing in the world as interesting as people, and one can never study them enough.” News from ICPSR Summer Program Stipend TIGER/Line Files@ICPSR PK-3 Data Resource Center
The Foundation for Child Development has announced a small grants program to be funded through its PK-3 Research and Evaluation Forum. A maximum of four awards of up to $50,000 each will be awarded to researchers proposing to use data from the PK-3 Data Resource Center. See http://www.icpsr.umich.edu/PK3/spotlight/rfp.html for more details. Researcher's Notes I am a PhD candidate in the Sociology Department, and my research focuses on the expansion of Brazilian higher education during recent decades. My goal is to evaluate the consequences of this expansion with respect to inequality and social stratification in Brazil, especially the changes in the likelihood of access to higher education for students of different social backgrounds. In order to do this research, I needed nationally representative data for the period. One of the best sources of data for this investigation is the National Household Sample Survey, or PNAD, which is coordinated by the Brazilian Institute of Geography and Statistics (IBGE). DISC already had several waves of PNAD data, some of them with codebooks translated from Portuguese to English. I recently had the opportunity to work with DISC personnel to acquire the last five waves of the survey (2001-2006). Their help has been crucial to the development of my research. The PNAD was first implemented in Brazil in 1967, and has been repeated annually since 1970 (except for census years). This rich publicly-available data is multipurpose with a strong focus on the labor market, including questions about general aspects of the population, education, income and housing, as well as migration, marriage and fertility. Each year the core questions of the survey are kept unchanged, allowing for the investigation of national trends across time. Most waves also contain special supplements focusing in depth on specific social issues such as health, education, and female fertility. PNAD is an invaluable resource for students interested in population, labor and social stratification issues in Latin America, with a focus on Brazil. Census 2010 and Paper-Based Technology Plans for door-to-door data collection for the United States’ 2010 Census took a decided turn for the non-technical in early April. The U.S. Census Bureau had planned to use wireless handheld computers both for collecting answers to the questionnaire, and for verifying residential street addresses. Now, due to a mishandled contract with an outside vendor, Secretary of Commerce Carlos Gutierrez has announced that the Census will have to rely on paper forms for the questionnaire data. The paper forms represent a return to the technology that the Bureau was attempting to leave behind for 2010. The Florida-based Harris Corporation, which had been awarded the contract, will still work with the Census Bureau to provide handheld computers for address canvassing. For more information from the Census Bureau about the handheld computers, see http://tinyurl.com/4bebl3. In an unrelated move in the direction of traditional data collection, the 2010 Census will not provide an option this year for respondents to answer the questionnaire online. This is a change from the 2000 Census, in which an Internet response option was made available for the first time, though with very little publicity due to confidentiality concerns. Until mid-2006, the Census Bureau had indicated that an Internet response option was intended as part of the 2010 Census, but then reversed that decision as plans moved forward. To read more about the Internet response option issue, see http://www.itif.org/files/eCensusUnplugged.pdf. Please note: DISC will be closed
Crossroads Corner Crossroads Corner highlights web sites recently added to the searchable Internet Crossroads in Social Science Data on the DISC web site. European Data Center for Work and Welfare (EDACwowe) EDACwowe organizes its site, via a left-hand menu bar, around the categories of Comparative Data, National Data, and International Repositories. The Comparative Data category is the most detailed, with subheadings for opinion surveys, socio-economic surveys, indicators and statistics, and policies and institutions. Each survey in the Comparative Data category gets a multi-part description on the EDACwowe site, including survey type, participating countries, topics, and availability and searchability of questionnaires and data. The National Data category, by contrast, gives only links and archive names, and the International Repositories category gives a short descriptive paragraph for each link. EDACwowe is coordinated and supported by the University of Tilburg (The Netherlands) and by the Danish National Centre for Social Research. USA Counties This past month, the Census Bureau announced that downloadable data files in Excel format have been added to the USA Counties site. Users can now bypass the drop-down menus and directly download files by topic, each file containing data for all of the counties nationwide. TrafficSTATS TrafficSTATS is a joint project between Carnegie Mellon University and the AAA Foundation for Traffic Safety, and can be found at http://www.aaafoundation.org/trafficSTATS/. |
©2009 Board of Regents of the University of Wisconsin System.
If you have trouble accessing this page, please contact disc@mailplus.wisc.edu.