Crossroads Category Browse Results

Data Extraction Tools and Methodology

(Back to the Crossroads Search main page)

American FactFinder (U.S. Bureau of the Census)
American FactFinder provides access to data about the United States, Puerto Rico and the Island Areas. The data in American FactFinder come from several censuses and surveys. They include the American Community Survey, the American Housing Survey, Annual Economic Surveys, Annual Surveys of Governments, the Census of Governments, the Commodity Flow Survey, the Decennial Census, the Economic Census, the Economic Census of the Island Areas, the Survey of Business Owners, the Equal Employment Opportunity (EEO) Tabulation, the Population Estimates Program, and the Puerto Rico Community Survey. Starting in June 2019, American FactFinder will have no new data releases. Its contents are migrating to Learn more about by watching this recorded webinar.

American National Standards Institute (ANSI) codes (U.S. Bureau of the Census)
"American National Standards Institute codes (ANSI codes) are a standardized set of numeric or alphabetic codes issued by the American National Standards Institute (ANSI) to ensure uniform identification of geographic entities through all federal government agencies. These standards replace the Federal Information Processing Standards (FIPS) codes previously issued by the National Institute of Standards and Technology (NIST)." The Census Bureau, as a major user of FIPS codes, provides this page with links to ANSI codes publications.

ArcGIS Explorer: Free GIS Data Viewer (esri (formerly Environmental Systems Research Institute))
From the web site: "ArcGIS Explorer is a free, downloadable GIS viewer that gives you an easy way to explore, visualize, and share GIS information." In addition to the free ArcGIS Explorer software, this site also offers free newsletters and networking with other users, plus information on esri's commercial products and services.

Area Health Resource File (AHRF) (Quality Resource Systems, Inc.)
"The Area Health Resource File (AHRF) is a database containing over 6,000 variables for each county in the US. AHRF formerly known as Area Resource File (ARF) is used for health service research, health policy analysis, and other geographically based activities." The ARF data from 1940-1990, as well as the 1999, 2005 and 2009-2010 releases, are available at DISC for UW-Madison campus users. The ARF website provides a search engine to identify which variables are available in the most recent annual release.

Argentina National Institute of Statistics and Censuses (INDEC) (Argentina National Institute of Statistics and Censuses)
Argentina National Institute of Statistics and Censuses (INDEC, Spanish acronym) "is the technical government agency responsible for the coordination and supervision of all public statistical activities taking place in the Argentine territory." INDEC's statistics include Population, Living Conditions, Employment and Income, Education, Health, Tourism and Culture, Price Indexes, Agricultural Sector, Mining and Energy, Industry and Construction, Trade and Services, Businesses, National Accounts, International Accounts, Foreign Trade, and New Information and Communication Technologies (ICTs).

Basic Tables: 1990 Demographic Profile Generator (Urban Information Center, University of Missouri-St. Louis)
"This application generates a single 1990 'Basic Tables' (demographic profile) report for any of the supported geographic units, including census tract, block group, city (no size limit), 5-digit ZIP code, state, county or metro area for anywhere in the United States. Enter only codes relevant to the area for which you want data." Although this is a terrific resource, it is not necessarily easy to use -- primarily because the selection is geographic code-based rather than clicking on a selection list. The good news is that there is a Lots of Helpful Examples document to help get you started; this document provides links to places that can help you get the codes you need.

CDC WONDER (U.S. Dept. of Health and Human Services, Centers for Disease Control)
CDC Wonder provides a gateway to a wide variety of reports and numeric public health data. Many of the links are to menu-based extraction systems that produce downloadable summary data tables. The gateway covers the following categories: chronic diseases, communicable diseases, environmental health, health practice and prevention, injury prevention, and occupational health. The site also has an A-to-Z topic index.

Census and Survey Processing System: CSPro (U.S. Bureau of the Census)
CSPro is a public domain software package for entering, editing, tabulating and mapping census and survey data. The software is used in over 160 countries and multiple organizations, from NGOs to universities, worldwide. Free registration required for download.

CensusScope (Social Science Data Analysis Network)
The Social Science Data Network (SSDAN) at the University of Michigan offers this point and click interface to Census 2000 data. Pre-selected topics for charts, maps, and trends let the user choose a state or metro area using drop-down menus. The graphics are eye-catching and suitable for printing.

Center for Social Research Methods (William M.K. Trochim)
This site offers materials and links for people involved in social research. Site highlights include The Knowledge Base, an online hypertext textbook on applied social research methods such as defining a research question, sampling, measurement, research design and data analysis; a simulation book of manual (such as dice-rolling) and computer simulation exercises of common research designs; and a statistical advisor that points users toward appropriate statistical measures based on answers to a series of questions. William Trochim, author of the site, is a professor at Cornell University

Chance Course & Database (Dartmouth College)
The Chance Project was funded by the National Science Foundation from 1992 to 1996 to develop instructional materials for teaching basic probability and statistical concepts using examples drawn from current news and the real world. The Chance website contains a teacher's guide, syllabi and lecture notes, activities and datasets, and a current newsletter (web or e-mail) that culls up-to-date examples from current news reporting.

Conducting Research Surveys via Email and the Web (The Corporation and Matthias Schonlau, Ronald D. Fricker, Jr., Marc N. Elliott)
This 118-page publication examines the burgeoning trend of online research surveys. The authors carry out a literature review, discuss the advantages and disadvantages of online surveys, and offer practical suggestions for design and implementation. A chapter of case studies rounds out the publication. The seven chapters and three appendices may be downloaded together or separately in PDF. (U.S. Bureau of the Census)
The U.S. Census Bureau has a new platform for visitors to access its data and digital content. Surveys and programs on the new platform include 2017 Economic Census and 2018 American Community Survey. American FactFinder and DataFerrett will be replaced by this new platform. Starting in June 2019, American FactFinder will have no new data releases. On the new platform, visitors can type in words or phrases in one simple search box or use advanced search by topics, geographies, years, surveys, and industries. Data can be downloaded in CSV format. OnTheMap, MyCongressional District, and many other tools will continue to be available to Census data users. Check out Data Gems for experts' tips and How-to documents about this new microdata analysis system.

DataCite (DataCite)
DataCite is a global organization, based in London, that helps researchers to find, access, and reuse data. It provides persistent identifiers, Digital Object Identifiers (DOIs) for research data to make them visible and accessible.

Dissemination Standards Bulletin Board (DSBB) (International Monetary Fund)
The DSBB provides information about the Special Data Dissemination Standard (SDDS), established in 1996 to guide countries that have, or that might seek, access to international capital markets in the dissemination of economic and financial data to the public and the General Data Dissemination System (GDDS), established in 1997 to guide countries in providing comprehensive, timely, accessible, and reliable economic, financial, and socio-demographic data to the public.

Econometrics Laboratory Software Archive (ELSA) (Econometrics Laboratory, University of California, Berkeley)
The Econometrics Software Laboratory Archive of the Econometrics Laboratory at the University of California, Berkeley strives to facilitate the interchange of computational algorithms that have economic applications. ELSA makes available a variety of algorithms, programs, software manuals, and econometrics-related datasets and textbooks available for download.

FAIRMODEL Economic Model (Ray C. Fair)
The FAIRMODEL economic models from Yale provides macroeconomic analysis for free. The site allows users to, "Work with a U.S. macroeconometric model (US model) or a multicountry econometric model (MC3 model) to forecast, do policy analysis, and examine historical episodes. Users can change government policy variables and examine the estimated effects of the changes, table and graph online and/or download all or part of the historical data, forecast data, and data you may have created, read online and/or download all the documentation, memos, and paper. Download for use on your own computer: the Fair-Parke (FP) program, the US model, the MC3 model, and the US model in EViews format. Users can also analyze a presidential vote equation, including examining Bush's chances in 2004, and perform stock market experiments."

Federal Justice Statistics Resource Center (FJSRC) (U.S. Bureau of Justice Statistics and Urban Institute)
The U.S. Bureau of Justice Statistics through the Federal Justice Statistics Resource Center "compiles comprehensive information describing suspects and defendants processed in the Federal criminal justice system. The goal of FJSRC is to provide uniform case processing statistics across all stages of the Federal criminal justice system. Its Federal Criminal Case Processing Statistics (FCCPS) tool is an interface used to analyze federal case processing data. Users can generate various statistics in the areas of federal law enforcement, prosecution/courts and incarcerations, and based on title and section of the U.S. Criminal Code.

Firearm-safety Among Children & Teens Consortium (FACTS) (Inter-university Consortium for Political and Social Research (ICPSR))
The Firearm Safety Among Children and Teens (FACTS) Consortium is funded by the National Institute for Child Health and Human Development. It is designed to develop research resources for firearm injury prevention. Datasets, methodology, research projects, publications, and other resources are available from this site.

FlowingData (Nathan Vau)

Since mid-2007 Nathan Vau, a PhD candidate in Statistics at UCLA, has been running the FlowingData blog. His interest is in data visualization, and his blog has attracted other like-minded data enthusiasts, who interact in a fascinating conversation, with lots of thought-provoking images and animations, on how data can be presented.

One useful category of post holds up data graphics from the media for critique by the FlowingData community. Another category presents visualizations created by the blog author himself, with animations on such topics as mapping the expansion of WalMart in the United States over time, and mapping the use of the word "inauguration" on Twitter messages worldwide in the hours surrounding the events of January 20, 2009 in Washington DC. FlowingData also holds contests for its readers to contribute visualizations based on a given dataset, while a forum page adds opportunities for reader input.

Gateway to Global Aging Data ( Dornsife Center for Economic and Social Research (CESR), Program on Global Aging, Health & Policy, University of Southern California)
Gateway to Global Aging Data is a platform designed for harmonizing cross-national studies of aging to Health and Retirement Study (HRS). It includes Health and Retirement Study (HRS), Mexican Health and Ageing Study (MHAS), English Longitudinal Study on Ageing (ELSA), Survey of Health, Ageing, and Retirement in Europe (SHARE), Korean Longitudinal Study on Aging (KLoSA), Japanese Study on Aging and Retirement (JSTAR), Indonesia Family Life Survey (IFLS), China Health, Aging, and Retirement Longitudinal Study (CHARLS), Irish Longitudinal Study on Ageing (TILDA), Study on Global Ageing and Adult Health (SAGE), and Longitudinal Aging Study in India (LASI). It has a digital library contains survey questions, sets of harmonized variables, and tools to search, compare, and obtain the information from various health and retirement surveys from over 25 countries. Its Interactive Graphs and Tables page can produce population estimates in a graph or table format as well as data visualization on a map of the globe.

General Social Survey (GSS) (National Opinion Research Center (NORC))
"The General Social Survey (GSS) conducts basic scientific research on the structure and development of American society with a data-collection program designed to both monitor social change within the United States and to compare the United States to other nations." The GSS has been conducted regularly since 1972, and many of the core questions are unchanged to allow comparison across years. The GSS site at NORC allows users to search all GSS documents, browse GSS variables, download in SAS or SPSS, or analyze the data online in Nesstar. (The site also links to the SDA online analysis site at Berkeley.)

Geographic Codes Lookup (Missouri Census Data Center (MCDC))
Missouri Census Data Center (MCDC) creates this page for looking up standard codes for areas grouped into common census geographic summary levels. For United States, it includes regions, divisions, states, Micropolitan/Metropolitan Statistical Areas (CBSAs) and places. For each state it has counties, places, CBSAs, urban areas/clusters, school districts, and county subdivisions. Populations are shown in parentheses after each area on the list. Most codes are linked. The link leads to a page that lists multiple data sources for the selected area.

GIS Guide to Good Practice (Arts and Humanities Data Service (AHDS))
This UK-based site is for those who create, maintain, use and and preserve GIS-based digital resources. Although the overall emphasis is upon archaeological data, the information presented has much wider disciplinary implications. As well as providing a source of useful generic information, the guide emphasises the processes of long-term preservation, archiving and effective data re-use.

Global Burden of Disease (GBD) data (The Institute for Health Metrics and Evaluation (IHME))
Global Burden of Disease (GBD) data captures premature death and disability from more than 300 diseases and injuries in 195 countries, by age and sex, from 1990 to the present, allowing comparisons over time, across age groups, and among populations. The Institute for Health Metrics and Evaluation (IHME) has built online data visualization tools to present recent GBD data using country profiles and US county profiles. Some GBD data can be downloaded from the Global Health Data Exchange (GHDx) site :

Google Refine (Google)
According to the Google Refine blog, "Google Refine is a power tool for working with messy datasets, including cleaning up inconsistencies, transforming them from one format into another, and extending them with new data from external web services or other databases." The software is open-source and builds on a project called Freebase Gridworks, which Google acquired in July 2010. The software is downloadable, so that users can work on their own desktop computers without having to load their data to a distant server.

Health Data for All Ages (HDAA) (U.S. National Center for Health Statistics)
According to the HDAA site, "this site presents tables that provide CDC health statistics for infants, children, adolescents, adults, and older adults. You can customize tables with any or all of the following characteristics: age, gender, race/ethnicity, and geographic location." Using the Beyond 20/20 system, users can browse and manipulate tables online, or download the tables and software for additional features. Table topics include: Pregnancy and Birth; Health Conditions/Risk Factors; Health Status and Disability; Health Care Access and Use; and Mortality.

ICD-10 Code Lookup Tool (Medical Billing and Coding Certification)
The ICD-10 (International Statistical Classification of Diseases - 10th Revision) is a medical classification list for the coding of diseases as maintained by the World Health Organization. The Medical Billing and Coding Certification site offers two options for looking up ICD-10 codes: a keyword search tool and a hierarchy browse. Both the searching and browsing tools offer a pop-up of US mortality data from the World Health Organization for each code. Note: The mortality-data pop-up works best with Firefox, Chrome, and the latest version of Internet Explorer.

ICPSR Student Data Sandbox (Inter-university Consortium for Political and Social Research (ICPSR))
Students and classes from ICPSR member institutions can self-publish data they generate and/or use in ICPSR Student Data Sandbox. Students can learn data management, generate data citations and persistent identifiers for their data, and learn others' data in this online tool.

Industry Concordances (Jon Haveman)
A wonderful site produced by Jon Haveman that provides access to a multitude of concordances (ISIC, SITC, usSIC, cSIC, HS and others) plus a list of acronyms and what they stand for AND verbal descriptions of the various classification systems.

International Household Survey Network (IHSN) (International Household Survey Network (IHSN))

The IHSN is a partnership of international organizations that aims to improve the availability and quality of household survey data in developing countries. For researchers looking for microdata, the site provides a central catalog of household surveys from developing countries, with contact information for the agencies and archives responsible for the data. When links are provided, they lead to the home page of the responsible archive; not all surveys listed are publicly available or available online. The site also provides a separate list of links to archives of survey data from developing countries.

For national statistical agencies, the IHSN site provides tools and guidelines in such areas as sampling, questionnaire design, anonymization, data archiving & dissemination. A database on planned censuses and surveys carries information about surveys that are planned or in process. A question-bank is under development, to help agencies harmonize their data collection efforts. The site also carries a Microdata Management Toolkit, developed by the World Bank Data Group, which includes a metadata editor and a CD-ROM builder tool. Some components of the toolkit are freely available, others require a license.

Internet for Social Statistics (Robin Rice, Edinburgh University Data Library and Intute)
The Internet for Social Statistics guide, written by Robin Rice of the Edinburgh University Data Library, offers a free tutorial on how to use social statistics. Users can tour sites for statistics, learn how to improve their data searching techniques, learn how to apply critical thinking skills to citing sources on the web, and reflect on how to use the web as a better tool for researching and teaching. This guide is part of the Virtual Training Suite.

Inter-University Consortium for Political and Social Research (ICPSR) (Inter-university Consortium for Political and Social Research (ICPSR))
The Inter-University Consortium for Political and Social Research (ICPSR), established in 1962, maintains and provides access to a vast archive of social science data for research and instruction, and offers training in quantitative methods to facilitate effective data use. To ensure that data resources are available to future generations of scholars, ICPSR preserves data, migrating them to new storage media as changes in technology warrant. In addition, ICPSR provides user support to assist researchers in identifying relevant data for analysis and in conducting their research projects. Codebooks are freely available, and data is available for download to ICPSR member institutions (DISC holds the UW-Madison ICPSR membership). Non-UW-Madison users may access the ICPSR site at

Introduction to Metadata: Pathways to Digital Information (Getty Standards Program)
Version 3.0 of Introduction to Metadata: Pathways to Digital Information has been placed online in its entirety by the Getty Standards Program. The publication now offers a "suite" of metadata crosswalks that map different sets of metadata. Also included are a glossary and list of hyperlinked acronyms. All sections of the book are available in both HTML and .pdf format.

IPUMS Higher Ed (The Minnesota Population Center, University of Minnesota)
IPUMS Higher Ed offers harmonized versions of the surveys incorporated into the NSF Scientists and Engineers Statistical Database (SESTAT), which is composed of three National Science Foundation surveys: the National Survey of College Graduates, the Survey of Doctorate Recipients, and the National Survey of Recent College Graduates. Its data includes education history, labor force status, employer and academic institution characteristics, income, and work activities. SESTAT data have been used previously to study a wide variety of topics, including gender differences in the labor force and the presence of immigrants in the U.S. science and engineering workforce.

IPUMS Multigenerational Longitudinal Panel (MLP) (Minnesota Population Center, University of Minnesota)
IPUMS USA has a set of crosswalks to link individual records in full count historical census data between adjacent censuses from 1900 to 1940. These cross-walks contain IDs that link persons between census years. They are intended for use with IPUMS data extracts. To download these data, you must agree to IPUMS USA terms of use and register to create your user account with IPUMS.

IPUMS Time Use (
IPUMS Time Use offers three resources to study time use. Annual American Time Use Survey (ATUS) covers the time use data in the United States from 2003 forward. American Heritage Time Use Study (AHTUS) has historical American time use data since 1965 harmonized for comparison over time, including the ATUS samples. Multinational Time Use Study (MTUS) provides time use data from around the world. Researchers can access these time diary data through a powerful web-based access system that constructs customized data sets ready for analysis. Free registration and agreement to conditions of use are required.

Longitudinal and Repeated Data Portals (Colectica Software)
This site lists several longitudinal studies on Colectica Portal. Visitors can search data documentation cross waves. Variables are harmonized and linked over time, and changes in measurements, coding, and descriptions are encoded using open data documentation standards such as DDI. Visitors need to register with each site to access data and documenation.

Malta National Statistics (National Statistical Office)
The National Statistical Office (NSO) was established in 1981 by the Statistical Service Act (Chapter 386) and became the central agency in Papua New Guinea for providing statistical information to meet the needs of the Government for the formulation of policy and planning. Under Section 106 of the 1995 Reformed Organic Law on Provincial and Local Level Government, the NSO was also given the mandate to assist in creating statistical databases at the Provincial and Local Government levels for policy formulation and planning at these levels.

Master Area Geographic Glossary Of Terms (MAGGOT) (Missouri Census Data Center)
This document supplies useful definitions of geographic units ("geocodes") commonly used in geographic databases such as MABLE/Geocorr. Included are definitions for State, County, MCD-CCD (County Subdivisions), Place, Census Tract, Block Group and Census Block.

Medical Expenditure Panel Survey (MEPS) (American Healthcare Research and Quality)
"The Medical Expenditure Panel Survey, or MEPS as it is commonly called, is the third (and most recent) in a series of national probability surveys conducted by AHRQ (American Healthcare Research and Quality) on the financing and utilization of medical care in the United States." A number of public use files are available for download, and some data is also available in tabular format. Online statistical tools are available for analyzing household data and employer-based insurance data.

National Collaborative on Childhood Obesity Research (NCCOR) Catalog of Surveillance Systems ( National Collaborative on Childhood Obesity Research (NCCOR))
The National Collaborative on Childhood Obesity Research (NCCOR) Catalogue of Surveillance Systems has 100 publicly available datasets with information on health behaviors, outcomes, determinants, policies and environmental factors. This free online resource was created for researchers and practitioners to investigate childhood obesity in America.

National Crosswalk Service Center (NCSC) (Iowa Center for Career and Occupational Resources (ICCOR))
The National Crosswalk Service Center (NCSC) specializes in occupational and training program classifications, their relationships to each other, and to related data. A "crosswalk" allows users to interpret one classification system in terms of another. NCSC makes crosswalks available for FTP download, and also serves as a depository of other computerized occupational and educational information resources.

NCES Handbook of Survey Methods (National Center for Education Statistics (NCES), U.S. Department of Education)
The NCES Handbook of Survey Methods explains how the National Center for Education Statistics (NCES) obtains and prepares the data it publishes for each of its survey programs.

Networked Social Science Tools And Resources (NESSTAR) (Networked Social Science Tools And Resources)
NESSTAR is a system for data discovery, location, browsing and dissemination via the Internet. A web-based browser interface called NESSTAR Light lets users do simple analyses and download data that has been mounted on a NESSTAR server. Data Archives currently offering data through NESSTAR include the UK Data Archive, Norwegian Social Science Data Services (NSD), Danish Data Archive (DDA) and the Finnish Social Science Data Archive (FSD).

NLS Investigator (Center for Human Resource Research (CHRR), Ohio State University)
The NLS Investigator is a web-based interface to documentation and data from all the cohorts of the National Longitudinal Study (NLS). Like its predecessor Web-Investigator, NLS Investigator allows users to search the database by variable name, question text, survey year and question number. Users can view the codebook information associated with variables, select and extract variables, and create a codebook unique to the variables chosen. Investigator provides value labels in the statistical results files. A weighting program option lets users create a custom set of survey weights, making it easier to accurately calculate summary statistics from multiple years of data. Registered users can perform variable extractions without downloading any software or full data files, and can update and save their tag sets on the server for up to 90 days. Result files can be saved to a local computer or left in a personal NLS Investigator account for up to 4 days. Note: the old Web-Investigator version will be disabled after October 29, 2010. Users who are using the new NLS Investigator for the first time will have to complete a one-time free re-registration.

Open Calls for Comment on Federal Data Collections (Association of Public Data Users (APDU))
The Association of Public Data Users provides an up-to-date spreadsheet, linked from this page, of current statistical issues for public comment as announced in the notices of the Federal Register. Calls for comment are organized first by agency, then by closing date for comments. Each call for comment is linked to the Federal Register page, along with a contact person. A second page within the spreadsheet lists already-expired calls for comment.

OpenICPSR (Inter-university Consortium for Political and Social Research (ICPSR))
OpenICPSR is a self-serve data repository for researchers who need to deposit their social and behavioral science research data for public access compliance. Researchers can share up to 2 GB data in OpenICPSR for free. Researchers prepare all data and documentation files necessary to allow their data collection to be read and interpreted independently. They also prepare metadata to allow their data to be searched and discovered in ICPSR catalog and major search engines. A DOI and a data citation will be provided to the depositor after data are published. Depositors will receive data download reports from OpenICPSR.

Penn World Tables (PWT) (University of Groningen)
The Penn World Table (PWT) is a set of national-accounts data developed to measure real GDP across countries and over time. PWT allows for comparisons of relative GDP per capita, as a measure of standard of living, the productive capacity of economies and their productivity level.

Population Research Institute (PRI) (Pennsylvania State University)
The Population Research Institute (PRI) at Penn State focuses on research and training in the population sciences. Their initiatives include providing a variety of Postsecondary Education Transcript Studies (PETS) (National Center for Education Statistics (NCES), U.S. Department of Education)
Many NCES longitudinal studies have collected postsecondary transcripts, beginning with the National Longitudinal Study of 1972, the transcripts for which were collected in 1984. These transcript data provide an important analytic resource. Researchers can study course-taking patterns, credit transfer, student momentum and attrition, and the connection among course and major choices, occupations, and wages. This tutorial give an introduction to PETS.

Sample Size Calculator (Creative Research Systems)
This site actually contains two calculators: one for determining the necessary sample size for a given population and desired confidence interval, and one for calculating confidence interval for a given population and sample size.

Scholars' Lab (University of Virginia)
Scholars’ Lab at the University of Virginia Library is set up to assist advanced students and researchers on their digital projects. Their faculty and staff focus on the digital humanities, geospatial information, and scholarly making and building at the intersection of our digital and physical worlds.

SDMX - Statistical Data and Metadata Exchange (SDMX)
Statistical Data and Metadata eXchange (SDMX) is an international initiative that aims at standardising and modernising the mechanisms and processes for the exchange of statistical data and metadata among international organisations and their member countries. SDMX is sponsored by seven international organisations including the Bank for International Settlements (BIS), the European Central Bank (ECB), Eurostat (Statistical Office of the European Union), the International Monetary Fund (IMF), the Organisation for Economic Cooperation and Development (OECD), the United Nations Statistical Division (UNSD), and the World Bank.

Social Research Update (University of Surrey, United Kingdom)
This general reference periodical for beginning social science researchers is issued quarterly by the Department of Sociology, University of Surrey, Guildford, England. Previous issues have included such topics as Ethnographic writing, Archiving qualitative research data, and Secondary analysis of qualitative data.

Social Science & Government Data Library (University of California, Berkeley)

The Social Science and Government Data Library (SSGDL) is a collaboration between the UC-Berkeley Library and UC DATA on the University of California, Berkeley campus. The SSGDL web site carries both an extraction system and FTP links for U.S. Census Data. The extraction system contains 1990 census data from SSTF1, SSTF2 (Ancestry of the Population of the US) SSTF3 (Persons of Hispanic Origin in the United States), and SSTF5 (Characteristics of Asian and Pacific Islander Population of the US). Users can pick both geographies and variables.

The FTP files available from the site include:

  • Census 2000: Summary File 1 (SF1), Redistricting Data (P.L. 94-171), Race and Hispanic or Latino Summary
  • 1990 Census: Congressional Districts in the U.S., Equal Employment Opportunity File, Public Law 94-171 data, Public Use Microdata Samples (PUMS) - 1% and 5% data, Summary Tape File 1B (includes PR files), Summary Tape File 3 (includes 3A, 3B, and 3C), Subject Summary Tape Files
  • 1970 Census Fifth Count Special Tabulation
  • County & City Databook 1988 and 1994
  • Current Population Survey files between 1988 and 1993
  • Economic Census Data (1987, 1992, 1997)
  • TIGER/Line 1997 files

Downloaded FTP files use the "Go" extraction system.

Social Science Japan Data Archive (SSJDA) (Center for Social Research and Data Archives, Institute of Social Science, University of Tokyo)
The Social Science Japan Data Archive (SSJDA) is affiliated with the Center for Social Research and Data Archives in the Institute of Social Science (ISS) at the University of Tokyo. SSJDA collects, maintains, and provides access to social science data to researchers who are interested in Japanese quantitative data for secondary analyses. Users are required to fill out online applications and get approval before they can access datasets housed in SSJDA. Most of the datasets are in Japanese.

Society for Political Methodology (American Political Science Association)
The Political Methodology section of the American Political Science Association offers a hyperlink to the Political Methodology Working Papers, 1995 to the present. Authors submit papers to be downloaded, abstracts are readable on the web, and the full text of the paper may be read or downloaded in PDF.

STATS (The Statistical Assessment Service)
The Statistical Assessment Service "examines the way that scientific, quantitative, and social research are presented by the media, and works with journalists to help them convey this material more accurately and effectively." STATS sets the record to rights in an engaging, direct manner.

SuperSTAR (Space-Time Research (Melbourne, Australia) and Alta Plana Corporation)
The SuperSTAR statistical tabulation suite is used for analysis and dissemination of demographic, social, survey, trade, and marketing data. The suite includes the SuperCROSS Windows module and the SuperWEB browser interface and provides integrated statistical calculations, confidentiality algorithms, charting, mapping, and data extraction. It runs against micro-data and summarized data cubes with a multi-lingual interface and support for multi-lingual metainformation. Note: this is a fee-based product.

Survey Documentation and Analysis (SDA) Archive (University of California-Berkeley)
SDA is a set of programs for the documentation and analysis of survey data. The programs can produce codebooks either for printing or for browsing on the World Wide Web. Data analysis programs in the package can be run in various ways, including online from a Web browser. Data available here include: GSS (General Social Survey) 1972-2004 Cumulative Datafile; NES (US National Election Study) cumulative back to 1948 plus individual years since 1996; some census microdata from the US and California; several Labor and Health surveys; and several surveys on racist attitudes and prejudice. The site also links to other data archives that use SDA. Also included at this site is information on the Data Documentation Initiative (DDI) and Instrument Documentation (IDOC), a project to develop network-browsable documentation for CAI instruments, including the SIPP.

Teaching with Data (Inter-University Consortium for Political and Social Research (ICPSR) and Social Science Data Analysis Network (SSDAN))
The Teaching With Data web site offers annotated links to data-driven teaching materials primarily aimed at the undergraduate level, though the site-wide search tool includes a K-12 option. Classroom resources include lessons and lectures, exercises and modules, syllabi and reading lists. Data resources include both tabular and downloadable data, data-based maps, and links to various data archives. Tools for analysis, visualization and course development are highlighted as well. Users can browse the site by discipline: anthropology, economics, environmental sciences, geography, history, political science, public policy, social work, and sociology. A "Data in the News" feature links the site to current events. Teaching With Data is a partnership between ICPSR and the Social Science Data Analysis Network (SSDAN), both at the University of Michigan. The project is funded by the National Science Foundation.

The Center for Spatially Integrated Social Science (National Science Foundation)
The CSISS site focuses on the importance of space, location, and place in social science research. The site features learning tools and bibliographies regarding GIS and social sciences, as well as a search engine and annotated links to spatial tools elsewhere on the web. In development is a data search engine intended for searching across social science data archives.

The Higher Education Resource Institute (University of California-Los Angeles)
The Higher Education Research Institute is an " interdisciplinary center for research, evaluation, information, policy studies, and research training in postsecondary education." Based in the University of California-Los Angeles Graduate School of Education and Information Studies, the HERI sponsors the CIRP survey of college freshmen as well as the HERI Faculty Survey and the CSS (College Senior Survey). Sample formats of the CSS and CIRP (Cooperative Institutional Research Program) are available on the site in PDF. The site also includes links to recent research at the Institute as well as Institute publications.

The National Counterterrorism Center (Office of the Director of National Intelligence)
This website provides information about the history, mission, purpose, and organizational structure of the National Counterterrosim Center. Includes online resources such as counterterroism and intelligence guides, speeches, security policy documents, and information about partnering agencies.

The Neighborhood Atlas (Amy J.H. Kind)
This online interactive tool allows visitors to rank and map neighborhoods according to socioeconomic disadvantage metrics. Researchers can download data and merged them with other data sources to examine how neighborhood disadvantage impacts health. Neighborhood Atlas updates and expands the Area Deprivation Index (ADI), a measure created by the Health Resources & Services Administration (HRSA) to the Zip+4 zip codes. It presents data on socio-economic factors including poverty, education, housing and employment indicators drawn from US Census data. The Neighborhood Atlas is free but requires visitors to register before they can download data.

TranStats (U.S. Bureau of Transportation Statistics)
TranStats comprises a broad collection of over 100 transportation datasets from various federal sources such as the Department of Transportation and the Census Bureau. TranStats is searchable by keyword or category. Some of the data descriptions link to data stored on other sites; for the many datasets stored at TranStats, however, users have interactive control over which variables to download, in addition to interactive analysis tools (simple statistical summaries, create time series or cross tabulations, generate graphics online, and cut/paste results into reports). A "mapping center" is also available through TranStats, carrying the National Transportation Atlas Databases (NTAD) and other transportation mapping tools. Note: TranStats was formerly known as the Intermodal Transportation Database.

Trends in Health and Aging (U.S. National Center for Health Statistics)
This site presents a collection of tables on trends in the health of older Americans showing data by age, sex, race and Hispanic origin. Using the Beyond 20/20 system, users can browse and manipulate tables online, or download the tables and software for additional features, including mapping and statistical tests. Tables are categorized into 19 topics: Chronic Conditions, Functional Status and Disability, Health Care Expenditures, Health Care Utilization, Health Insurance, Immunization, Incontinence, Injury, Life Expectancy, Living Arrangements, Mental Health, Mortality, Oral Health, Perceived Health Status, Population (Nation and State), Risk Factors, Socio-Economic Status, Special Equipment Use, and Use and Cost of Prescription Medication.

U.S. Demography (CIESIN)
Included here are informative explanations of the following datasets: Public Use Microdata Samples, Current Population Survey, Economic Census Data, County Business Patterns, County City Data Book, Statistical Abstract Supplement, National Economic Social and Environmental Databank, Regional Economic Information System, Enhanced County to County Migration 1985-1990, TIGER 1992 Boundaries, and STF3A Standard Extracts.

Uexplore / Dexter (Missouri Census Data Center)
Uexplore / Dexter is a web application in the Missouri Census Data Center's public data archive. Uexploer lets visitors navigate through the directories to locate U.S. Census and other public data files for further processing. Dexter is an application for data extraction. Dexter makes it easy to create subsets but it can be challenging to get exactly what a user wants. Review this Dexter guide, before you extract any data.

UN Classifications Registry (United Nations Statistics Division)
The Classifications registry keeps updated information on Statistical Classifications maintained by the United Nations Statistics Division (UNSD). Downloadable classifications include International Standard Industrial Classification of All Economic Activities(ISIC), Central Product Classification (CPC), Standard International Trade Classification(SITC), Classification by Broad Economic Categories (BEC), Classification of the Functions of Government (COFOG), Classification of Individual Consumption According to Purpose (COICOP), Classification of the Purposes of Non-Profit Institutions Serving Households (COPNI), Classification of the Outlays of Producers According to Purpose (COPP) and International Classification of Activities for Time-Use Statistics (ICATUS). Rulings, corrections, interpretations and proposals for future revisions are recorded and can be viewed from the Registry entries link at this site. National Classifications section includes information on national practices in the area of classifications, covering activity and product classifications used in countries around the world.

Understanding the 1990 Public Use Microdata Sample (PUMS) (UCLA)
This document presents a general overview of PUMS, while specifically discussing the distribution of census questionnaires, privacy protection, selection of PUMS 5% data, and structure of the 1990 PUMS 5% data which includes geographic, household, and person information.

Variable and Question Bank (UK Data Archive (UKDA) at the University of Essex)
The Variable and Question Bank is a reference source for question formats and wordings used on major social surveys in the UK. It provides supporting material on concepts and methodology, and aims to disseminate knowledge about survey data collection methods to achieve comparability of results.

WebCASPAR (National Science Foundation)
WebCASPAR bills itself as an "integrated science and engineering resources data system." The database system is a collection of statistical data from several surveys in higher education from NSF and NCES via a web-based extraction form, allowing users to create tables (or view pre-defined tables). Includes institution-level data. Free registration is required to be able to customize the search fully.

Western Libraries Map and Data Centre (University of Western Ontario, Canada)
The former Data Resources Library at the University of Western Ontario, Canada has joined forces with the Serge A. Sauer Map Library Map to form the Western Libraries Map and Data Centre. The holdings and services of the two former entities will be combined, in the mission "to deliver map, GIS and data services to the Western community of students, staff, and faculty."

WomanStats (The WomanStats Project)
WomanStats is a comprehensive database on the situation and status of women in the world. It covers issues such as rape, sex trafficking, maternal and child mortality, family law, women in government and the military, and many others for countries with populations greater than 200,000 persons. It includes quantitative data, qualitative information, current legal statutes, and scales created by the WomanStats researchers for cross-national comparison. Over 350 variables are organized by nine categories:

  • Women's Physical Security
  • Women's Economic Security
  • Women's Legal Security
  • Women's Security in the Community
  • Women's Security in the Family
  • Security for Maternity
  • Women's Security Through Voice
  • Security Through Societal Investment in Women
  • Women's Security in the State
To download data, you need to register with the site.

World Health Organization Statistical Information System (WHOSIS) (World Health Organization)
Provides searching and browsing options for finding international health-related statistics on the WHO web site and beyond. Online databases accessible from the WHOSIS page include Core Health Indicators; Life Tables; Mortality; Tuberculosis; HIV/AIDS; Alcohol; and Global Health Atlas. The WHO data offerings are more extensive than is immediately apparent. Users may want to use the site-search function on the WHOSIS page. The site also offers what they call a WHOSIS query service, consisting of a Frequently Asked Questions document and a contact form to send a question to WHO staff.

Worldmapper (University of Sheffield (UK) and University of Michigan)
The Worldmapper site takes its catchphrase, "The World as You've Never Seen It Before," and puts it into data-driven action, featuring cartograms that display global regions "re-sized according to the subject of interest." A total-population world map, for example, displays India and Japan swollen to outsized proportions, while the United States looms large on the map of private spending on health-care and Southeastern Africa dominates the map of HIV prevalence. Some of the broad topics include health, education, transportation, communication, work, and housing, but the list continues to expand. Each map comes with a downloadable PDF poster and downloadable data files in Excel and OpenDoc format. The Worldmapper project is a collaboration between the University of Sheffield (UK) which hosts the site, and the University of Michigan.

ZIP Code Resources Page (MABLE/Geocorr)
This page describes a series of tools for helping users deal with 5-digit U.S. postal ZIP code areas. It focuses primarily on tools for linking ZIP codes to other geographies (such as counties, cities, metro areas, ZCTAs) and to demographic information from the 2000 decennial census. A supplement was created to address ZCTA's from the 2010 census, which can be found here: All About Zip Codes: 2010 Supplement Excellent explanation of the "messiness" of using the ZIP code as a geographic unit.

(Back to the Crossroads Search main page)