DPLS News, December 1999

DPLS News contains articles about local, national, and international data issues.
It is published twice a semester by the library staff.

December 1999

XML... Where Do We Go From Here?

On November 29, 1999, DPLS was invited to take part in a DoIT-sponsored campus-wide forum on XML. Our participation focused on the Data Documentation Initiative (DDI) beta-test reported in the May 1999 Newsletter, http://dpls.dacc.wisc.edu/pubs/Newsletters/may99news.html#story1. The objectives of the forum were to:

  • Increase campus awareness of the growing use of XML in the IT industry.

  • Describe the problem of varying standards for defining the semantics and display characteristics of XML files.

  • Provide an opportunity for campus IT users to raise issues concerning the future use of XML.

XML stands for eXtensible Markup Language. XML is used to organize the contents of a document into elements. A codebook, for example, might include such elements as principal investigator, coding value, and variable. XML uses tags of the form <tagname> to begin an element of the document, and </tagname> to mark the end of an element. Because XML tags denote contents (as opposed to HTML tags which denote presentation), XML documents can be accessed by other applications, provided those applications can also access a common definition of the tags. This is similar to the way in which standard definitions for HTML tags have been published and hard-coded into browsers.

Given the commitment to XML by a number of vendors (Oracle, Novell, Microsoft, etc.) whose software and systems are widely used on campus, there is no real question about adopting XML. It will happen. The question is, how? XML is a meta-language, but as such it does not enhance interoperability by itself. Multiple applications which use the same XML files must have access to the same Document Type Definitions (DTD), or some other definition of the semantics of the database. For this to happen, they first have to agree on which definition to use. DPLS’s DDI project beta-tested the tag library for social science documentation (see http://dpls.dacc.wisc.edu/ddi/index.html).

According to ICPSR and the DDI committee which have taken a leadership role in developing a DTD for social science, the goal is to develop a codebook standard to serve as an interchange format and allow the development of Web applications such as:

  • NESSTAR, http://www.nesstar.org/. An infrastructure for data dissemination via the Internet. The NESSTAR Explorer provides an end user interface for searching, analyzing, and downloading data and documentation.

  • CESSDA Integrated Data Catalog, http://dastar.essex.ac.uk/Cessda/IDC. A current effort to provide electronic searching across several social science data archives around the world.

(Sections of this article are adapted from Issue Statement: Use of XML on the UW-Madison Campus by Wayne Shockley, Architecture Dept., DoIT, UW-Madison,

Researcher's Notes
by John Straub

Census CD + MapsUsing Census data has become significantly more manageable, thanks to some very handy CD-ROM products. One that’s made my life a lot easier is called Census CD + Maps. The product is produced by a small company called GeoLytics, and is available at DPLS. They use new data compression techniques so a single disc gives you access to information that would normally fill 75 CDs!

Taking only seconds to load, Census summary table data and cross-tabs on over 3,500 demographic variables are available for a wide variety of geographic entities down to the block group level. The feature that has been especially useful to me is the ability to define my own circular area of interest by inputing a latitude, longitude and radius. Census CD + Maps aggregates over all the relevant
census blocks, and creates an output file with summary data for the area of interest.

You can also create maps with this product, which are exportable in a number of formats including ArcView shape files.

CD-ROM logoNew at DPLS: MicroAnalyst

This product provides users with an easy tool for accessing Public Use Microdata Samples (PUMS) of the 1980 and 1990 decennial census. The PUMS contains a sample of individual long-form census records showing most population and housing characteristics. Both 5 % and 1 % samples are included in MicroAnalyst. Integrated Public Use Microdata Samples (IPUMS) for 1850 through 1970 are also among the data modules available to users. The user-friendly MicroAnalyst interface makes it remarkably easy to query, explore, and extract PUMS data, compared to the days when researchers had to wade through ASCII-format PUMS data on mainframe computers. To fit all the data on four CD-ROMs, a special file format with extensive indexing and compression is used. MicroAnalyst is installed on two public-use workstations at DPLS. Users are welcome to test-drive this powerful new tool.

O*NET The Occupational Information Network (O*NET)

The workplace has changed dramatically in the past decades, and the Occupational Information Network (O*NET) provides re-defined terms to describe the new occupational landscape. The O*NET database on CD-ROM, now available at DPLS, contains comprehensive information about job requirements and worker competencies for over 1,100 occupations. Each occupation is classified by a unique O*NET title and code and is cross-referenced to eight other classification systems, including the 1998 Standard Occupational Classification (SOC) system.

O*NET was developed by job analysts to replace the older Dictionary of Occupational Titles. Future O*NET updates will include data from employees and supervisors currently working in the occupations listed. For more information about O*NET, check the DPLS catalog at our web site or visit http://www.doleta.gov/programs/onet/.

Political Elites in Mexico, 1900-1971

Political Elites in MexicoDPLS’ Online Data Archive continues to grow with the addition of Political Elites in Mexico, 1900-1971. Professor Peter Smith, a former member of the UW-Madison Department of History faculty, collected study data from 1969-1971 and then deposited it with DPLS for posterity in 1974. Just recently, DPLS staff has chosen to make these data files available to a wider audience through online preservation.

The "elites" included in the study are 6,302 individuals who occupied a national political office in Mexico at any time between 1900 and 1971 (e.g., senators, members of the presidential cabinet, and ambassadors).

The variables found in this study include the date and place of birth and death of each "political elite," the geographical entity each politician represented, and their individual membership in political organizations, as well as their known activity during the revolution of 1910-1920.

For more information about the study, please refer to the following location on our web site: http://dpls.dacc.wisc.edu/mexican_political_elites/Index.html.

Researchers interested in individual office-holders should consult the electronic version of the document entitled "Identification Numbers and Personal Names for Individuals Included in the Dataset on Political Elites in Mexico, 1900-1971," http://dpls.dacc.wisc.edu/mexican_political_elites/mpe_ids.htm.

General Social Survey Student Paper Competition

The National Opinion Research Center (NORC) at the University of Chicago has announced this year’s General Social Survey (GSS) Student Paper Competition. The entry papers must: 1) be based on data from the 1972-1998 GSS or from the International Social Survey Program (any year or combination of years may be used), 2) represent original and unpublished work, and 3) be written by a student or students at an accredited college or university. Undergraduates and graduate students may enter and college graduates are eligible for one year after receiving their degree.

The papers will be judged on the basis of their: a) contribution to expanding understanding of contemporary American society, b) development and testing of social science models and theories, c) statistical and methodological sophistication, and d) clarity of writing and organization. Papers should be less than 40 pages in length (including tables, references, appendices, etc.) and should be double-spaced.

Separate prizes will be awarded to the best undergraduate and best graduate-level entries. Entrants should indicate in which group they are competing. Winners will receive a cash prize of $250, a commemorative plaque, and SPSS Base, the main statistical analysis package of SPSS. Honorable mentions may also be awarded by the judges.

Two copies of each paper must be received by February 15, 2000. The winner will be announced in late April, 2000. Send entries to: Tom W. Smith, General Social Survey, National Opinion Research Center, 1155 East 60th St., Chicago, IL 60637.

For further information:
Phone: (773) 256-6288
Fax: (773) 753-7886
Email: smitht@norcmail.uchicago.edu

Internet Corner


The Economagic site was initially set up in 1996 to provide economic time series for students in a class on applied economic forecasting. Since that time, it has grown to include over 100,000 time series, including: interest rates from the Federal Reserve; retail sales and building permits from the Census Bureau; labor, employment, CPI and PPI from the Bureau of Labor Statistics; stock market highs and lows; and more.

The time series are available as charts or Excel spreadsheets. In addition, multiple time series can be saved in a personal workspace and transformed through various calculations.

The Web address for the Economagic site is http://www.economagic.com/.


HCUPnet is part of the Healthcare Cost and Utilization Project, developed and maintained by the Agency for Healthcare Policy and Research. The HCUPnet interface provides access to national (U.S.) statistics about hospital stays, using 1996 data from the Nationwide Inpatient Sample. Users can generate custom tables by selecting specific conditions of interest, outcomes or measures such as length of stay or in-hospital death, and types of patients or hospitals to compare.

The tables can be saved as HTML files and then opened in either a word processor or spreadsheet. The Web address for HCUPnet is http://www.ahcpr.gov/data/hcup/HCUPnet.htm.

New Release of WLS

wlsicon.gif (1258 bytes)As of December 1999 a new release has been issued of all three waves of the Wisconsin Longitudinal Study as well as a new version of the extract program, WLSGV. The major change to waves 2 and 3 is the addition of the 1990 Census-based occupation codes. For details on how the new releases differ from the previous editions see Change Notices #18, #19, and #20 at http://dpls.dacc.wisc.edu/WLS/updates.htm.

DPLS Staff News

Please join us in welcoming the newest member of the DPLS staff, 50% Associate Special Librarian Joanne Juhnke. A graduate of the University of Michigan-Ann Arbor Library Science program, Joanne comes to DPLS from the St. Mary’s College of Maryland library where she was a reference librarian from 1995 to 1998. Her experience with Web development, electronic database products, and her commitment to user services are a fine match for her duties as Webmaster and Newsletter Editor at DPLS. Please take a moment to say hello to Joanne the next time you’re in 3308 Social Science.

