DPLS News, September 2003

Please note: Older issues of the newsletter are likely to contain
broken links -- the newsletter is presented here "as published."

DPLS News contains articles about local, national, and international data issues.
It is published twice a semester by the library staff.

Editor: Joanne Juhnke, Special Librarian
Contributors: Lu Chou, Senior Special Librarian, & Cindy Severt, Senior Special Librarian


September 2003


Table of Contents
Census Data All Over the Web
Virus Protection in the Social Science Building
Researcher’s Notes, by Shiu-Sheng Chen
DataFerrett Updates!
Staff News
New Terms, Definitions for Statistical Areas
NES 2002: Update
How to Create a Thematic Map Using American FactFinder
Using DataFerrett to Create a Simple Table, Employment Status By Regions

Internet Corner

Davidson Data Center and Network (DDCN)
Cancer.gov: Surveillance, Epidemiology, and End Results (SEER)
BP Statistical Review of World Energy
Nationmaster.com



Census Data All Over the Web

In recent years and especially since the release of 2000 U.S. Census data, there has been a proliferation of online interfaces designed to present the data for web-surfing users.

U.S. map iconThe grand-daddy of them all is the Census Bureau’s own American FactFinder, http://factfinder.census.gov/. American Fact-Finder offers access to the summary files for Census 2000, along with the 1990 Census, American Community Survey, and 1997 Economic Census. Users can create thematic or reference maps (see Newsletter insert for a thematic map example); display standard or custom tables in HTML or download them in various formats; or FTP raw census data.

For a focus on racial and ethnic change in metropolitan areas from 1990 to 2000, the Lewis Mumford Center at SUNY-Albany has created a site at http://www.albany.edu/mumford/census/. The site’s data page offers 11 topic choices, including segregation in the population as a whole, school segregation, and homeowners/renters. For each topic, users can access data in HTML by selecting a metro area or viewing sortable lists, or in Excel by downloading.

There are several census data sites presented by individual states as well. The Missouri Census Data Center (MCDC), http://mcdc.missouri.edu/, starts with Missouri census data but includes nationwide components and other economic indicators as well. Highlights include the UEXPLORE application that lets advanced users explore and download from the MCDC data archive; the MCDC Internet Map Server (GIS for non-experts); and a new Circular Area Profiles online application that generates aggregate Summary File 3 (SF3) data for a circular area with a user-defined center and radius.

Other state-based sites combine census data with data from other sources. The WisStat site at http://www.wisstat.wisc.edu/ is a project of the UW-Extension and the Wisconsin Department of Administration. The site uses the Wisconsin data from the Summary (Tape) Files 1 and 3 from 1990 and 2000, plus data collected by Wisconsin government. Users select variables to create tables to view or download.

The Stats Indiana site, http://www.stats.indiana.edu/, from the Kelley School of Business at Indiana University, uses census data as one component in providing “Data for Indiana and the Nation.” Data from the Bureau of Labor Statistics and the Bureau of Economic Analysis also feed into the site’s in-depth profiles of Indiana counties and regions, and its U.S. Counties IN Profile feature allows side-by-side county comparisons nationwide.

DPLS’ own Internet Crossroads in Social Science Data, http://dpls.dacc.wisc.edu/newcrossroads/, contains still more examples of sites doing their own thing with U.S. census data. Do you know of others we should add?

Table of contents

Virus Protection in the Social Science Building

Though DPLS interacts “virtually” with many users, our physical presence in the Social Science Building prompts us to pass along a word of computer-virus precaution.

sick computerAs the semester has started, people have been bringing laptops into the building from elsewhere. If these computers are virus-infected, as soon as they are connected to the building network and booted up, the virus/worm may try to spread. Since such connections are already behind the building’s firewall, any Windows PC already on the network and not properly protected then gets the virus/worm.

For this reason, the Social Science Computing Cooperative (SSCC) requests the following: If you are bringing in a Windows PC or laptop for use in the Social Science Building, please take it to the SSCC Help Desk in Soc. Sci. 4315 prior to connecting it to the building network. They will check it and install any necessary patches. This process usually takes 30-60 minutes. SSCC’s Help Desk is open 8-12 and 1-4, Monday through Friday.

Thank you for helping to keep the Social Science Building free of computer viruses!


Table of contents

Researcher’s Notes
by Shiu-Sheng Chen

I am a Ph.D. candidate and a project assistant in the Department of Economics, UW-Madison. Both my own research and the PA project deal with a lot of macroeconomic and financial time series data from developed and developing countries. Collecting all the data I need is a challenging and time-consuming task. Fortunately, the DPLS staff introduced me to a powerful database - Datastream - to get the job easily done.

You may wonder why I need to get data from Datastream since macroeconomic and financial data are also available from many other sources. For instance, one can download U.S. data from Bureau of Economic Analysis (BEA), Bureau of Labor Statistics (BLS) and Federal Reserve Banks (the Fed). For data from countries other than U.S., one can use either International Financial Statistics (IFS) or the Organization for Economic Cooperation and Development (OECD) historical database.

From my personal experience, however, there are three reasons that I strongly recommend Datastream. First of all, Datastream contains data in high frequency. If you want to explore high frequency financial time series data, Datastream often has both weekly and daily data while IFS and OECD data provide only quarterly, monthly, and annual data. Moreover, Datastream also contains the data from IFS and OECD. Secondly, Datastream provides much more financial data than IFS. One can easily obtain the time series of stock indices, bond indices, and many short-term/long-term interest rates for any country you would like to study.

Finally, it is unbelievably easy to learn how to use Datastream since the interface is very user-friendly. In addition, DPLS staff can give you a brief introduction and are willing to help you to solve any problems you may encounter.

Table of contents

DataFerrett Updates!

In July the U.S. Census Bureau released an updated version of the DataFerrett application. The latest version adds new functions and several new studies, including U.S. Census Summary (Tape) Files 1 and 3 from 1990 and 2000; and Public Use Microdata Sample (PUMS) 1% and 5% files from 1990 and 1% file from 2000.

Long time DataFerret users should be aware that the http://ferret.bls.census.gov version of Ferrett is no longer being supported. DPLS has installed the DataFerrett application to our public-use PCs. One can use this application to view various studies or create small data subsets or cross-tabs at our library. Tutorials for DataFerrett can be found at http://dataferrett.census.gov. In addition, see the Newsletter insert for an example of creating a simple crosstab using DataFerrett.

Table of contents

Staff News

Please join us in welcoming two new members of the DACC staff. Dr. Steven Durlauf, Kenneth J. Arrow Professor in the Department of Economics, joins us as our new director, as we bid farewell to Dr. Kenneth Mayer, seven-year veteran of the position (and currently on sabbatical). Professor Durlauf’s research includes social interactions, income inequality, economic growth, and ap-plications of decision theory to econometrics. His fields of teaching are macroeconomics and monetary economics, and econometrics.

Also joining us this fall as Network Administrator is Brian De Smet. Brian is a 2002 graduate of the University of Iowa with a degree in computer science. As an undergrad, Brian was already managing a heterogeneous network of hardware and operating systems. We hear Brian also has a flair for mixing ingredients in the kitchen, and we hope he shares his culinary talents here too!

Table of contents

New Terms, Definitions for Statistical Areas

Every ten years, the U.S. Office of Management and Budget revises the standards and definitions of statistical areas based around population and employment centers. The most recent definitions were announced in June 2003.

The new standards and definitions come with new terminology. For the 2000 Census, the collective term has been changed to “core-based statistical areas,” which now includes metropolitan statistical areas, micropolitan statistical areas, combined statistical areas (combinations of metropolitan and micropolitan statistical areas), and New England City and Town Areas.

Micropolitan statistical areas, a new category in this revision, have at least one urban cluster of at least 10,000 but less than 50,000 population. Meanwhile, two other terms have been retired: primary metropolitan statistical areas (PMSAs) and consolidated metropolitan statistical areas (CMSAs) are no longer part of the definitions.

For more details on both current and historical definitions, see the Census Bureau web site at http://www.census.gov/population/www/estimates/metrodef.html.

Table of contents

NES 2002: Update

The NES 2002 Full Release dataset is now available for download from the NES web site at http://www.umich.edu/~nes. From the main page in the left column choose “Download Data and Codebooks for Free,” register, and look under “Time-Series Studies.” The NES 2002 Advance Release was distributed on February 28, 2003. Its codebook, data and data descriptor files are not compatible for use with this Full Release File. The Full Release contains variables that were not in the Advance Release, and column locations and variable names are different between the two files.

Table of contents

Internet Corner

Davidson Data Center and Network (DDCN)

The Davidson Data Center and Network (DDCN) specializes in data on transition and emerging market economies. The Center provides an archive of datasets for direct download (free registration required) along with outstandingly-detailed annotated links to data from other providers, both free and fee-based. A single interface provides access to both DDCN data and off-site links.

The DDCN is a project of the William Davidson Institute at the University of Michigan, and is sponsored by the National Science Foundation. The site can be found at http://ddcn.prowebis.com/.

Cancer.gov: Surveillance, Epidemiology, and End Results (SEER)

SEER bannerThe SEER Program currently collects and publishes cancer incidence and survival data from 11 population-based cancer registries and three supplemental registries covering approximately 14 percent of the population of the United States. Data includes patient demographics, primary tumor site, morphology, stage at diagnosis, first course of treatment, and follow-up. The site carries numerous reports, tables, and graphs based on the data in the SEER Public-Use database. Direct access to the database is free of charge, but requires a signed public-use agreement, outlined on the site. Also available are several software tools for working with the database.

The address for the SEER site is http://seer.cancer.gov/.

BP Statistical Review of World Energy

BP (formerly British Petroleum) has been publishing a Statistical Review of World Energy since 1951. The most recent edition appeared in June 2003, with data as recent as 2002. The publication is on the site in both HTML and PDF, with charts and maps in PowerPoint and downloadable data in Excel 2000. Most of the time series go back to the 1960s or 1970s, with annual crude oil prices back as far as 1861.

The site also includes an energy-charting tool that allows exportable graphic representations based on data in the review, and some simple calculations. The tool is Java-based and requires at least Internet Explorer 5.0 or Netscape 6.2.

The BP Statistical Review of World Energy is available at http://www.bp.com/centres/energy/.

Nationmaster.com

Nationmaster.com was founded as an engine for comparing countries using the figures from the latest CIA World Factbook. Other freely-available sources have been added, so that statistics range from pesticide use to web-site defacements to Olympic gold medals. Results for each query consist of a single data point for each country, and are displayed as a comparison in an HTML table with a bar graph.

The site aims its appeal at a popular rather than an academic audience, though sources and methods are documented. The site is most useful for cursory comparisons and also for tracking an indicator back to its source, where more in-depth data may be available.

Visit the site at http://www.nationmaster.com/.

Table of contents

 

How to Create a Thematic Map Using American FactFinder

This example explains how to create a map using 2000 Census data to depict the percentage of occupied housing units in Madison, WI that are renter-occupied.

  1. Go to http://factfinder.census.gov (or, go to http://www.census.gov and click on American Fact-Finder in the left hand column).

  2. Scroll past Basic Facts and Data Sets to Maps in the wide middle column. Click on Thematic Maps. Notice that the default map depicted is of population density by age for the United States.

  3. From the line near the top of the page that says You are here: Main>All Data Sets/Data Sets with Thematic Maps>Geography>Themes>Results, click on Geography. Select Place as Geographic Type, and Wisconsin as the State. When the screen is refreshed select Madison city as the geographic area in the third window. Click Show Result. Notice that the resulting map for Madison, WI has the default theme of population density by age. Clicking Map It would have generated an unthemed map.

  4. Change the theme of the map by selecting Themes from the “You are here” line near the top of the page. Scroll down and highlight Percent of occupied housing units that are renter-occupied. Click Show Result. Notice that the highest density of renter-occupied housing is on the Isthmus and near campus. Notice too, that the map is drawn by default by Census Tract. Notice the different results obtained when the map is redrawn by Block Group or by Block using the pull-down menu.

Map with Renter-occupied units

Table of contents

 

Using DataFerrett to Create a Simple Table,
Employment Status by Regions

DataFerrett is intended for extracting a relatively small number of variables from a dataset or creating customized crosstabs or frequencies. In this example, we will create a table using two variables from the July 2003 Current Population Survey data file available from the DataWeb in DataFerrett.

  1. Launch the DataFerrett program and log in with your email address.

  2. Under the Start Tab, choose Search Datasets by Topics and Themes.

  3. Under the Microdata Tab, start with Step 1: Select Datasets & Variables. In the left-hand column, open the folder for Current Population Survey, then the folder for Basic, and then double-click Jul 2003 to select it.

  4. In the Ferrett Topics Window, check Labor Force Variables and Geography Variables; then click OK.

  5. Hold down the CTRL key and highlight these two variables from the list that appears: Labor Force-Employment Status (PEMLR) and Geography-Region (GEREG).

  6. Click on the Review/Browse Highlighted Variables button to view the selected variables.

  7. In the Browse/Select Variables & Values window, check the Select All Variables box and then click OK.

  8. Skip Step 2: Data Shopping Basket; this example does not use the Data Shopping Basket features.

  9. DataFerrett logoUnder Step 3: Download/Make a Table, click on the Tabulate button.

  10. A tabulation area will appear in the left of the screen, waiting for you to identify where you would like your variables displayed. Use the mouse to “drag and drop” Labor Force-Employment Status to R1 in the tabulation area and Geography-Region to C1.

  11. Click on the GO button in the toolbar to retrieve the data.

  12. Under File in the top menu bar, use Save As to save the result in a format you like. Options include HTML, tab-delimited text, comma-delimited text, or Ferrett’s proprietary tabulation format. Your file will be saved in C:\theDataWeb folder by default.

The DataFerrett program is installed on the public-use PCs at DPLS, and is also available for download at http://www.thedataweb.org/browser.html. There is an extensive user’s guide to the DataFerrett program at http://www.thedataweb.org/support/user/.

Table of contents