Please note: Older issues of the newsletter are likely to contain
broken links -- the newsletter is presented here "as published."
DPLS News contains articles about local, national, and international data issues.
It is published twice a semester by the library staff.
Editor: Joanne Juhnke, Special Librarian
Contributors: Lu Chou, Senior Special Librarian, & Cindy Severt, Senior Special Librarian
Table of Contents
Five Exabytes of Information
ICPSR Variables Database Now Online
Two CD-ROM Updates
Recently-released Sexual Behavior Surveys
ICPSR Summer Program
Did You Know? Two Copy-and-Paste Tips
“We are drowning in information,” said Rutherford D. Rogers, a Yale librarian during an interview with the New York Times in 1985. He was commenting on the enormous number of books, periodicals and other documents published each year. Now almost two decades later, the amount of information that is created each year is immense and we are trying hard to keep our heads above this endless sea of information.
Is there a way to quantify all the information being produced in the world every year? In 2000 a research team at the School of Information System and Management (SISM) at the University of California, Berkeley led by Peter Lyman and Hal Varian calculated that the amount of new information created in 1999 was large enough to reach to the moon and back 13 times. That 2000 report generated so much interest, that Lyman and Varian did a follow-up study using 2002 data and published their report in 2003.
According to “How Much Information? 2003”, approximately 5 exabytes of new information were created and stored in 2002, double the information created in 1999. An exabyte (EB) is 1,000,000,000,000,000,000 bytes. Five exabytes of information presented in print form would fill half a million new libraries the size of the Library of Congress!
The researchers looked at newly created information stored in four physical media: print, film, magnetic, and optical. In order to quantify the new information the team made various working assumptions in the areas of storage compression, duplication and estimates to produce the measurement.
Beyond print and fixed digital media, the study also measured information flows going through four major electronic channels -- telephone, radio, TV and Internet -- to identify trends in the production and consumption of information.
In a separate interview, Lyman noted that the study does not deal with how people cope with such massive amounts of information, but said that his next goal is to study the consumption of information. He also pointed out that the current study does not address the quality of information sources, but focuses solely on quantity.
“How Much Information? 2003” is available on the web at http://www.sims.berkeley.edu/research/projects/how-much-info-2003/.
Through funding from the National Science Foundation, ICPSR has created a Social Science Variables Database (SSVD), http://www.icpsr.umich.edu:8080/SSVD/basicSrch, that enables searching across studies at the variable level for over 33,000 discrete variables in 69 different ICPSR studies. Such variable level, cross-study searching is made possible by the Data Documentation Initiative (DDI), a codebook markup standard.
A basic search using the search string ELECTIONS yielded 307 entries in 16 different studies including 17 hits from the General Social Surveys 1972-2002 cumulative file. Each hit results in the variable mnemonic, the variable label, the question, and the ICPSR study number.
Clicking on the mnemonic brings up unweighted frequencies for that variable; clicking on the variable label brings up the ICPSR study description.
The Data Documentation Initiative is an international effort to establish a standard for technical documentation describing social science data. Codebooks marked up according to the DDI standard can be easily transported into the SSVD to be searched without additional processing.
DPLS now offers updated versions of CPS Utilities and CensusCD on our public-use workstations. Both are popular among our users because they have friendly data retrieval programs to assist users in extracting data from otherwise-complex data sources.
CPS Utilities: Annual Demographic File, 1962-2003
Starting with the 2003 survey, this series is now called the Annual Social and Economic Study. It is commonly known as the March supplement of Current Population Survey (CPS). CPS is the source of many official federal government statistics, such as the monthly unemployment rate. It is administered by the Bureau of the Census under the auspices of the Bureau of Labor Statistics (BLS). It contains the basic demographic data as well as data on work experience, income sources and amounts, noncash benefits, health insurance, and migration. The CPS files distributed by the Census Bureau are inconvenient to use in several ways: variables change location and length over time; old variables are dropped and new ones added; and codings change from time to time. The CPS Utilities user interface allows users to view the questionnaires, search variables, select specific variables and easily create data subsets containing specific variables in certain years. It also offers uniformly recoded versions of selected variables.
CensusCD 2000 products
DPLS has purchased 6 CensusCD 2000 titles: Long Form Summary File 3 (SF3), Long Form Demographic Profile, Short Form Summary File (SF1), Short Form Data Blocks Level, Redistricting (Public Law 94-171) Data, and Redistricting Data Blocks Level. Each CensusCD comes with an easy user interface that lets users generate full-blown maps or tables. Users can extract data as dbf, ASCII, shape or mid/mif files and use them as input files for other programs e.g. statistical (SAS, SPSS), database (Access, Oracle), spreadsheet (Excel, 1-2-3), and mapping (Arc View, MapInfo).
Two sexual behavior studies have been released from the Population Research Center in the National Opinion Research Center (NORC) at the University of Chicago:
Chicago Health and Social Life Survey, 1997
“Today’s marriages occur later in life and are often briefer, requiring a new dynamic in the ways in which people meet and form relationships.” This is what Principal Investigator Dr. Edward O. Laumann at the University of Chicago has concluded using data from this newly released survey. This survey focuses on sexual behavior. There are 17 sections, including among others: demographic data; sexual partners; detailed information on two most recent partners; partners in last 12 months; use of birth control methods; respondent’s social networks; respondent’s exposure to sexual contact as a child; respondent’s exposure to forced sexual contact; first sexual intercourse; lifetime sexual history; neighborhood characteristics; general and sexual health; attitudes toward sexuality and sex roles; domestic violence; sexual orientation; general sexuality. To learn more about this survey and download the data, visit http://www.src.uchicago.edu/prc/chsls.php.
Chinese Health and Family Life Survey, 1999
Contemporary China is on the leading edge of a sexual revolution, with tremendous regional and generational differences that provide unique natural experiments for analysis of the antecedents and outcomes of sexual behavior. This study produces a baseline set of results on sexual behavior and disease patterns, using a nationally representative probability sample. It is one of the first omnibus studies of sexual behavior in a developing country. Topical areas include childhood sexual contact, intimate partner violence, forced sex, sexual harassment, body image concerns, sexual well-being, and sexually transmitted diseases and risk behavior. This study is available online at http://www.src.uchicago.edu/prc/chfls.php.
The ICPSR Summer Program in Quantitative Methods of Social Science Research will be held between June and August 2004. Most, but not all, of the workshops are held at the University of Michigan at Ann Arbor. Offerings vary from introductory to advanced, and from 3-5 days to 4 weeks in length. Application deadline is April 26, or first-come/first-served for 3- to 5-day workshops. Course schedules and online applications materials may be found at http://www.icpsr.umich.edu/training/summer/index.html.
Did you know that it is possible to copy and paste from an HTML table on the web into an Excel spreadsheet? In the Internet Explorer browser on a PC, use the left mouse-button to highlight the table or the area of the table that you want to copy. Then press CTRL-C (shortcut for “copy”). Open a blank Excel worksheet and position your cursor in the cell that you want at the upper left of the table. Then press CTRL-V (shortcut for “paste”). The more straightforward the table, the more likely that it will copy well.
Did you know that you can also copy and paste text from a PDF document into Word? The key to this trick is the text-selection tool in the Acrobat Reader. In version 5 of the Reader, there is a toolbar icon with the letter “T” and a dotted box next to it. Clicking on that will give you the text selection tool. Clicking on the drop-down arrow next to it will give the choice of that tool or the column selection tool. Column select allows you to specify a region of the screen to select, instead of having to select whole lines. In either case, once the text is selected, use CTRL-C to copy and CTRL-V to paste into Word.
The European Social Survey (ESS) is a new survey encompassing 19 European nations and focusing on public attitudes and values related to the ongoing change in social institutions. The effort is funded by the European Commission, the European Science Foundation, and academic institutions in the participating countries. An initial round of data collection was completed in 2001, using a core questionnaire and two rotating modules. The microdata was released in August 2003 via the NESSTAR system at http://ess.nsd.uib.no/nesstarlight/index.jsp. Funding for a second round of data collection has also been obtained, with data release scheduled for 2005.
Questionnaires, methodology and other background information are available at http://www.europeansocialsurvey.org/.
This 118-page document from the RAND Corporation (by Matthias Schonlau, Ronald D. Fricker, Jr., Marc N. Elliott) examines the burgeoning trend of online research surveys. The authors carry out a literature review, discuss the advantages and disadvantages of online surveys, and offer practical suggestions for design and implementation. A chapter of case studies rounds out the publication.
The seven chapters and three appendices may be downloaded together or separately in PDF at http://www.rand.org/publications/MR/MR1480/.
The University of Texas Inequality Project (UTIP) is a research group concerned with inequality in wages and earnings worldwide. Their work emphasizes Theil's T statistic, which they apply to industrial data from the United States, the Organization for Economic Cooperation and Development (OECD), and the United Nations Industrial Development Organization (UNIDO). They compare their results to the more famous Deininger and Squire household inequality datasets, published by the World Bank.
The site carries two freely-downloadable datasets in Excel format, plus working papers in PDF and presentations in PowerPoint. Visit the UTIP site at http://utip.gov.utexas.edu/.
Wisconsin Food Security Project
The Wisconsin Food Security Project is a cooperative effort between the University of Wisconsin Extension and the Wisconsin Department of Health and Family Services. Households are considered food insecure if their access to food through normal channels is limited or uncertain. The website, at http://www1.uwex.edu/ces/flp/cfs/, presents numbers by Wisconsin county for indicators in the following categories: federal nutrition assistance programs, food production and marketing, community infrastructure and nutrition education, emergency food programs, and economic indicators.
Unfortunately, time series data is not available, though some indicators do list percent-change over several years. Reports, presented as HTML tables, can compare counties or cover the entire state.