Please note: Older issues of the newsletter are likely to contain
broken links -- the newsletter is presented here "as published."
DPLS News contains articles about local, national, and international
It is published twice a semester by the library staff.
Editor: Joanne Juhnke, Special Librarian
Contributors: Lu Chou, Senior Special Librarian, & Cindy Severt, Senior Special Librarian
Table of Contents
Census Data All Over the Web
Virus Protection in the Social Science Building
Researchers Notes, by Shiu-Sheng Chen
New Terms, Definitions for Statistical Areas
NES 2002: Update
How to Create a Thematic Map Using American FactFinder
Using DataFerrett to Create a Simple Table, Employment Status By Regions
In recent years and especially since the release of 2000 U.S. Census data, there has been a proliferation of online interfaces designed to present the data for web-surfing users.
The grand-daddy of them all is the Census Bureaus own American FactFinder, http://factfinder.census.gov/. American Fact-Finder offers access to the summary files for Census 2000, along with the 1990 Census, American Community Survey, and 1997 Economic Census. Users can create thematic or reference maps (see Newsletter insert for a thematic map example); display standard or custom tables in HTML or download them in various formats; or FTP raw census data.
For a focus on racial and ethnic change in metropolitan areas from 1990 to 2000, the Lewis Mumford Center at SUNY-Albany has created a site at http://www.albany.edu/mumford/census/. The sites data page offers 11 topic choices, including segregation in the population as a whole, school segregation, and homeowners/renters. For each topic, users can access data in HTML by selecting a metro area or viewing sortable lists, or in Excel by downloading.
There are several census data sites presented by individual states as well. The Missouri Census Data Center (MCDC), http://mcdc.missouri.edu/, starts with Missouri census data but includes nationwide components and other economic indicators as well. Highlights include the UEXPLORE application that lets advanced users explore and download from the MCDC data archive; the MCDC Internet Map Server (GIS for non-experts); and a new Circular Area Profiles online application that generates aggregate Summary File 3 (SF3) data for a circular area with a user-defined center and radius.
Other state-based sites combine census data with data from other sources. The WisStat site at http://www.wisstat.wisc.edu/ is a project of the UW-Extension and the Wisconsin Department of Administration. The site uses the Wisconsin data from the Summary (Tape) Files 1 and 3 from 1990 and 2000, plus data collected by Wisconsin government. Users select variables to create tables to view or download.
The Stats Indiana site, http://www.stats.indiana.edu/, from the Kelley School of Business at Indiana University, uses census data as one component in providing Data for Indiana and the Nation. Data from the Bureau of Labor Statistics and the Bureau of Economic Analysis also feed into the sites in-depth profiles of Indiana counties and regions, and its U.S. Counties IN Profile feature allows side-by-side county comparisons nationwide.
DPLS own Internet Crossroads in Social Science Data, http://dpls.dacc.wisc.edu/newcrossroads/,
contains still more examples of sites doing their own thing with U.S.
census data. Do you know of others we should add?
Though DPLS interacts virtually with many users, our physical presence in the Social Science Building prompts us to pass along a word of computer-virus precaution.
As the semester has started, people have been bringing laptops into the building from elsewhere. If these computers are virus-infected, as soon as they are connected to the building network and booted up, the virus/worm may try to spread. Since such connections are already behind the buildings firewall, any Windows PC already on the network and not properly protected then gets the virus/worm.
For this reason, the Social Science Computing Cooperative (SSCC) requests the following: If you are bringing in a Windows PC or laptop for use in the Social Science Building, please take it to the SSCC Help Desk in Soc. Sci. 4315 prior to connecting it to the building network. They will check it and install any necessary patches. This process usually takes 30-60 minutes. SSCCs Help Desk is open 8-12 and 1-4, Monday through Friday.
Thank you for helping to keep the Social Science Building free of computer
I am a Ph.D. candidate and a project assistant in the Department of Economics, UW-Madison. Both my own research and the PA project deal with a lot of macroeconomic and financial time series data from developed and developing countries. Collecting all the data I need is a challenging and time-consuming task. Fortunately, the DPLS staff introduced me to a powerful database - Datastream - to get the job easily done.
You may wonder why I need to get data from Datastream since macroeconomic and financial data are also available from many other sources. For instance, one can download U.S. data from Bureau of Economic Analysis (BEA), Bureau of Labor Statistics (BLS) and Federal Reserve Banks (the Fed). For data from countries other than U.S., one can use either International Financial Statistics (IFS) or the Organization for Economic Cooperation and Development (OECD) historical database.
From my personal experience, however, there are three reasons that I strongly recommend Datastream. First of all, Datastream contains data in high frequency. If you want to explore high frequency financial time series data, Datastream often has both weekly and daily data while IFS and OECD data provide only quarterly, monthly, and annual data. Moreover, Datastream also contains the data from IFS and OECD. Secondly, Datastream provides much more financial data than IFS. One can easily obtain the time series of stock indices, bond indices, and many short-term/long-term interest rates for any country you would like to study.
Finally, it is unbelievably easy to learn how to use Datastream since the interface is very user-friendly. In addition, DPLS staff can give you a brief introduction and are willing to help you to solve any problems you may encounter.
In July the U.S. Census Bureau released an updated version of the DataFerrett application. The latest version adds new functions and several new studies, including U.S. Census Summary (Tape) Files 1 and 3 from 1990 and 2000; and Public Use Microdata Sample (PUMS) 1% and 5% files from 1990 and 1% file from 2000.
Long time DataFerret users should be aware that the http://ferret.bls.census.gov version of Ferrett is no longer being supported. DPLS has installed the DataFerrett application to our public-use PCs. One can use this application to view various studies or create small data subsets or cross-tabs at our library. Tutorials for DataFerrett can be found at http://dataferrett.census.gov. In addition, see the Newsletter insert for an example of creating a simple crosstab using DataFerrett.
Please join us in welcoming two new members of the DACC staff. Dr. Steven Durlauf, Kenneth J. Arrow Professor in the Department of Economics, joins us as our new director, as we bid farewell to Dr. Kenneth Mayer, seven-year veteran of the position (and currently on sabbatical). Professor Durlaufs research includes social interactions, income inequality, economic growth, and ap-plications of decision theory to econometrics. His fields of teaching are macroeconomics and monetary economics, and econometrics.
Also joining us this fall as Network Administrator is Brian De Smet. Brian is a 2002 graduate of the University of Iowa with a degree in computer science. As an undergrad, Brian was already managing a heterogeneous network of hardware and operating systems. We hear Brian also has a flair for mixing ingredients in the kitchen, and we hope he shares his culinary talents here too!
Every ten years, the U.S. Office of Management and Budget revises the standards and definitions of statistical areas based around population and employment centers. The most recent definitions were announced in June 2003.
The new standards and definitions come with new terminology. For the 2000 Census, the collective term has been changed to core-based statistical areas, which now includes metropolitan statistical areas, micropolitan statistical areas, combined statistical areas (combinations of metropolitan and micropolitan statistical areas), and New England City and Town Areas.
Micropolitan statistical areas, a new category in this revision, have at least one urban cluster of at least 10,000 but less than 50,000 population. Meanwhile, two other terms have been retired: primary metropolitan statistical areas (PMSAs) and consolidated metropolitan statistical areas (CMSAs) are no longer part of the definitions.
For more details on both current and historical definitions, see the
Census Bureau web site at http://www.census.gov/population/www/estimates/metrodef.html.
The NES 2002 Full Release dataset is now available for
download from the NES web site at http://www.umich.edu/~nes.
From the main page in the left column choose Download Data and
Codebooks for Free, register, and look under Time-Series
Studies. The NES 2002 Advance Release was distributed on February
28, 2003. Its codebook, data and data descriptor files are not compatible
for use with this Full Release File. The Full Release contains variables
that were not in the Advance Release, and column locations and variable
names are different between the two files.
Davidson Data Center
and Network (DDCN)
The Davidson Data Center and Network (DDCN) specializes in data on transition and emerging market economies. The Center provides an archive of datasets for direct download (free registration required) along with outstandingly-detailed annotated links to data from other providers, both free and fee-based. A single interface provides access to both DDCN data and off-site links.
The DDCN is a project of the William Davidson Institute at the University
of Michigan, and is sponsored by the National Science Foundation. The
site can be found at http://ddcn.prowebis.com/.
The SEER Program currently collects and publishes cancer incidence and survival data from 11 population-based cancer registries and three supplemental registries covering approximately 14 percent of the population of the United States. Data includes patient demographics, primary tumor site, morphology, stage at diagnosis, first course of treatment, and follow-up. The site carries numerous reports, tables, and graphs based on the data in the SEER Public-Use database. Direct access to the database is free of charge, but requires a signed public-use agreement, outlined on the site. Also available are several software tools for working with the database.
The address for the SEER site is http://seer.cancer.gov/.
BP (formerly British Petroleum) has been publishing a Statistical Review of World Energy since 1951. The most recent edition appeared in June 2003, with data as recent as 2002. The publication is on the site in both HTML and PDF, with charts and maps in PowerPoint and downloadable data in Excel 2000. Most of the time series go back to the 1960s or 1970s, with annual crude oil prices back as far as 1861.
The site also includes an energy-charting tool that allows exportable graphic representations based on data in the review, and some simple calculations. The tool is Java-based and requires at least Internet Explorer 5.0 or Netscape 6.2.
The BP Statistical Review of World Energy is available at http://www.bp.com/centres/energy/.
Nationmaster.com was founded as an engine for comparing countries using the figures from the latest CIA World Factbook. Other freely-available sources have been added, so that statistics range from pesticide use to web-site defacements to Olympic gold medals. Results for each query consist of a single data point for each country, and are displayed as a comparison in an HTML table with a bar graph.
The site aims its appeal at a popular rather than an academic audience, though sources and methods are documented. The site is most useful for cursory comparisons and also for tracking an indicator back to its source, where more in-depth data may be available.
Visit the site at http://www.nationmaster.com/.
This example explains how to create a map using 2000 Census data to depict the percentage of occupied housing units in Madison, WI that are renter-occupied.
- Go to http://factfinder.census.gov (or, go to http://www.census.gov and click on American Fact-Finder in the left hand column).
- Scroll past Basic Facts and Data Sets to Maps in the wide middle
column. Click on Thematic Maps. Notice that the default map depicted
is of population density by age for the United States.
- From the line near the top of the page that says You are here: Main>All
Data Sets/Data Sets with Thematic Maps>Geography>Themes>Results,
click on Geography. Select Place as Geographic Type, and Wisconsin
as the State. When the screen is refreshed select Madison city as
the geographic area in the third window. Click Show Result. Notice
that the resulting map for Madison, WI has the default theme of population
density by age. Clicking Map It would have generated an unthemed map.
- Change the theme of the map by selecting Themes from the You
are here line near the top of the page. Scroll down and highlight
Percent of occupied housing units that are renter-occupied. Click
Show Result. Notice that the highest density of renter-occupied housing
is on the Isthmus and near campus. Notice too, that the map is drawn
by default by Census Tract. Notice the different results obtained
when the map is redrawn by Block Group or by Block using the pull-down
DataFerrett is intended for extracting a relatively small number of variables from a dataset or creating customized crosstabs or frequencies. In this example, we will create a table using two variables from the July 2003 Current Population Survey data file available from the DataWeb in DataFerrett.
- Launch the DataFerrett program and log in with your email address.
- Under the Start Tab, choose Search Datasets by Topics and Themes.
- Under the Microdata Tab, start with Step 1: Select Datasets &
Variables. In the left-hand column, open the folder for Current Population
Survey, then the folder for Basic, and then double-click Jul 2003
to select it.
- In the Ferrett Topics Window, check Labor Force Variables and Geography
Variables; then click OK.
- Hold down the CTRL key and highlight these two variables from the
list that appears: Labor Force-Employment Status (PEMLR) and Geography-Region
- Click on the Review/Browse Highlighted Variables button to view
the selected variables.
- In the Browse/Select Variables & Values window, check the Select
All Variables box and then click OK.
- Skip Step 2: Data Shopping Basket; this example does not use the
Data Shopping Basket features.
Step 3: Download/Make a Table, click on the Tabulate button.
- A tabulation area will appear in the left of the screen, waiting
for you to identify where you would like your variables displayed.
Use the mouse to drag and drop Labor Force-Employment
Status to R1 in the tabulation area and Geography-Region to C1.
- Click on the GO button in the toolbar to retrieve the data.
- Under File in the top menu bar, use Save As to save the result in a format you like. Options include HTML, tab-delimited text, comma-delimited text, or Ferretts proprietary tabulation format. Your file will be saved in C:\theDataWeb folder by default.
The DataFerrett program is installed on the public-use PCs at DPLS, and is also available for download at http://www.thedataweb.org/browser.html. There is an extensive users guide to the DataFerrett program at http://www.thedataweb.org/support/user/.