Data Extraction Tools and Methodology
(Back to the Crossroads Search main page)
A Cure for the Common Codes (Missouri Census Data Center (MCDC)) John Blodgett, author of the Common Codes site, has assembled a set of commonly-used geographic codes and displayed them compactly, one page for each U.S. state and the District of Columbia. Contents include: counties, places (cities), county subdivisions, various kinds of metropolitan/micropolitan areas, urban clusters and urbanized areas, and school districts. The codes are almost all FIPS codes, with the exception of school districts. The main page also has a very nice summary of the evolution of the Metropolitan Statistical Area and the corresponding coding. American FactFinder (U.S. Bureau of the Census) American FactFinder is the primary online census data dissemination site from the U.S. Bureau of the Census. The interface lets users browse, search, and map data from the 1990 Census, the 1997 Economic Census, the American Community Survey, and Census 2000. ArcExplorer: Free GIS Data Viewer (Environmental Systems Research Institute, Inc.) In addition to the ArcExplorer software, this site also offers information about GIS, downloadable basemap and thematic world datafiles and more. Area Resource File (ARF) (Quality Resource Systems, Inc.) “The Area Resource File (ARF) is a database containing over 6,000 variables for each county in the US. ARF is used for health service research, health policy analysis, and other geographically based activities.” The ARF data from 1940-1994, as well as the 1999 and 2005 releases, are available at DISC for UW-Madison campus users. The ARF website provides a search engine to identify which variables are available in the most recent annual release. Argentina National Institute of Statistics and Censuses (INDEC) (Argentina National Institute of Statistics and Censuses) Argentina National Institute of Statistics and Censuses "is the technical government agency responsible for the coordination and supervision of all public statistical activities taking place in the Argentine territory." Includes tables in Excel. (This site is in Spanish, with an abridged English version.) Basic Tables: 1990 Demographic Profile Generator (University of Missouri-St. Louis, Urban Information Center) "This application generates a single 1990 'Basic Tables' (demographic profile) report for any of the supported geographic units, including census tract, block group, city (no size limit), 5-digit ZIP code, state, county or metro area for anywhere in the United States. Enter only codes relevant to the area for which you want data." Although this is a terrific resource, it is not necessarily easy to use -- primarily because the selection is geographic code-based rather than clicking on a selection list. The good news is that there is a Lots of Helpful Examples document to help get you started; this document provides links to places that can help you get the codes you need. CDC WONDER (U.S. Dept. of Health and Human Services, Centers for Disease Control) CDC Wonder provides a gateway to a wide variety of reports and numeric public health data. Many of the links are to menu-based extraction systems that produce downloadable summary data tables. The gateway covers the following categories: chronic diseases, communicable diseases, environmental health, health practice and prevention, injury prevention, and occupational health. The site also has an A-to-Z topic index. CensusScope (Social Science Data Analysis Network) The Social Science Data Network (SSDAN) at the University of Michigan offers this point and click interface to Census 2000 data. Pre-selected topics for charts, maps, and trends let the user choose a state or metro area using drop-down menus. The graphics are eye-catching and suitable for printing. Center for Social Research Methods (William M.K. Trochim) This site offers materials and links for people involved in social research. Site highlights include The Knowledge Base, an online hypertext textbook on applied social research methods such as defining a research question, sampling, measurement, research design and data analysis; a simulation book of manual (such as dice-rolling) and computer simulation exercises of common research designs; and a statistical advisor that points users toward appropriate statistical measures based on answers to a series of questions. William Trochim, author of the site, is a professor at Cornell University Chance Course & Database (Dartmouth College) The Chance Project was funded by the National Science Foundation from 1992 to 1996 to develop instructional materials for teaching basic probability and statistical concepts using examples drawn from current news and the real world. The Chance website contains a teacher’s guide, syllabi and lecture notes, activities and datasets, and a current newsletter (web or e-mail) that culls up-to-date examples from current news reporting. Conducting Research Surveys via Email and the Web (RAND Corporation and Matthias Schonlau, Ronald D. Fricker, Jr., Marc N. Elliott) This 118-page publication examines the burgeoning trend of online research surveys. The authors carry out a literature review, discuss the advantages and disadvantages of online surveys, and offer practical suggestions for design and implementation. A chapter of case studies rounds out the publication. The seven chapters and three appendices may be downloaded together or separately in PDF. CSPro (U.S. Bureau of the Census) CSPro is a public domain software package for entering, editing, tabulating and mapping census and survey data. The software is reportedly widely used by national statistical agencies in developing countries for the purposes of censuses and household surveys. Free registration required for download. Data.gov (U.S. Office of Management & Budget) Launched in May 2009, Data.gov is an initiative of the Obama administration designed to "increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government." While the interface is likely to change rapidly over time, as of June 2009 the site contained three searchable catalogs: a "raw data" catalog, a tool catalog, and a geodata catalog. Searches can be done by category, agency, or keyword. The project is also working with state and local agencies to launch similar initiatives, and links to such initiatives via a clickable map. Users may comment on or give a 1-to-5-star rating on entries in the Data.gov catalog, and may also make suggestions for additions to the site. An alternate place for discussion and recommendations for Data.gov is a wiki hosted by Wired, at http://howto.wired.com/wiki/Open_Up_Government_Data DataFerrett (U.S. Bureau of the Census and Centers for Disease Control) The DataWeb site, featuring the DataFerrett application, is intended as a successor to the now-defunct FERRETT application. This downloadable data browser runs from the desktop and uses a graphic interface for browsing and searching through variables in the available datasets, downloading subsets, and generating tables. Currently available data includes: Current Population Survey, Survey of Income and Program Participation, American Community Survey, American Housing Survey, Small Area Income Poverty Estimates, Population Estimates, Economic Census Areawide Statistics, National Center for Health Statistics data, and Centers for Disease Control data. Most recently added data includes U.S. Census SF1 & SF3 for 1990 and 2000, and 1990 (1% & 5%) and 2000 (1%) PUMS as well. Digital Chart of the World (DCW) (Pennsylvania State University Libraries) Download the boundaries and layers of individual countries, in Arc/INFO export format, and preview the data online. Note that national boundaries reflect political reality as of 1991/92. For more information about the DCW, see The Digital Chart of the World (DCW) & Data Quality Project at the Agricultural University of Norway. Dissemination Standards Bulletin Board (DSBB) (International Monetary Fund) The DSBB provides information about the Special Data Dissemination Standard (SDDS), established in 1996 to guide countries that have, or that might seek, access to international capital markets in the dissemination of economic and financial data to the public and the General Data Dissemination System (GDDS), established in 1997 to guide countries in providing comprehensive, timely, accessible, and reliable economic, financial, and socio-demographic data to the public. Econometrics Laboratory Software Archive (ELSA) (Econometrics Software Laboratory Archive, University of California, Berkeley) The Econometrics Software Laboratory Archive at the University of California, Berkeley strives to facilitate the interchange of computational algorithms that have economic applications. ELSA makes available a variety of algorithms, programs, software manuals, and econometrics-related datasets and textbooks available for download. Datasets include, among others, Lorna Greening's Integrated Consumer Expenditure Survey data files for 1980-1994; David Card's collection of 1970 Census: raw (state) files and extracts; and several of David Card's collections of Current Population Survey files. Extract Software and Documentation for Census CD-ROMs (U.S. Bureau of the Census) EXTRACT is a general purpose data display and extraction tool that works with Census Bureau CD-ROMs recorded in dBASE format (such as 1990 STFs and EEO files; County and City Data Book 1994 and 1998; and others). This page provides everything needed to run EXTRACT with each Census CD-ROM: downloadable software, auxiliary files needed for particular datasets, and documentation. FAIRMODEL Economic Model (Yale University) The FAIRMODEL economic model from Yale provides macroeconomic analysis for free. The site allows users to, "Work with a U.S. macroeconometric model (US model) or a multicountry econometric model (MC3 model) to forecast, do policy analysis, and examine historical episodes. Users can change government policy variables and examine the estimated effects of the changes, table and graph online and/or download all or part of the historical data, forecast data, and data you may have created, read online and/or download all the documentation, memos, and paper. Download for use on your own computer: the Fair-Parke (FP) program, the US model, the MC3 model, and the US model in EViews format. Users can also analyze a presidential vote equation, including examining Bush's chances in 2004, and perform stock market experiments." Federal Justice Statistics Resource Center (FJSRC) (U.S. Bureau of Justice Statistics and Urban Institute) The U.S. Bureau of Justice Statistics through the Federal Justice Statistics Resource Center "compiles comprehensive information describing suspects and defendants processed in the Federal criminal justice system. The goal of FJSRC is to provide uniform case processing statistics across all stages of the Federal criminal justice system. Using data obtained from Federal agencies, FJSRC compiles comprehensive information that describes person-cases processed through the system." Users can download compressed ASCII versions of Standard Analysis File (SAF) data sets (free registration required), generally going back as far as 1994. For fiscal years 1978-1994, the SAFs are archived at the National Archive of Criminal Justice Data (http://www.icpsr.umich.edu/NACJD/) as ICPSR Study Number 9296. The site offers a menu-based tablemaker that they describe as “Online Analysis” to display and download tables from selected data sets, going back to the year 2000. In addition, the site carries an archive of publications including annual, technical and special topic reports. Federal Processing Information Standards (FIPS) Codes (U.S. Bureau of the Census) Federal information processing standards codes (FIPS codes) are a standardized set of alphanumeric codes issued by the United States National Institute of Standards and Technology (NIST) to standardize identification of geographic entities through all U.S. federal government agencies. The Census Bureau, as a major user of FIPS codes, provides this page with links to FIPS look-up utilities and NIST publications regarding FIPS codes. Of particular interest: the FIPS PUB 6-4 Lookup for Counties which also includes a national file for all the FIPS county codes; and the Census 2000 FIPS 55 codes for county subdivisions and places, also with a lookup utility and national-file download. FlowingData (Nathan Vau)
Since mid-2007 Nathan Vau, a PhD candidate in Statistics at UCLA, has been running the FlowingData blog. His interest is in data visualization, and his blog has attracted other like-minded data enthusiasts, who interact in a fascinating conversation, with lots of thought-provoking images and animations, on how data can be presented.
One useful category of post holds up data graphics from the media for critique by the FlowingData community. Another category presents visualizations created by the blog author himself, with animations on such topics as mapping the expansion of WalMart in the United States over time, and mapping the use of the word “inauguration” on Twitter messages worldwide in the hours surrounding the events of January 20, 2009 in Washington DC. FlowingData also holds contests for its readers to contribute visualizations based on a given dataset, while a forum page adds opportunities for reader input. General Social Survey Data and Information Retrieval System (GSSDIRS) (Interuniversity Consortium for Political and Social Research (ICPSR)) The GSSDIRS site allows users to search all GSS documents, browse the GSS Codebooks (1972-2000), perform online subsetting, and generate cross-tabs from the GSS. The codebook includes frequencies of response over time and have links to the annotated bibliography of studies that have analyzed each variable. The online subsetting form allows you to limit cases by year or any other variable, then name the variables you wish to be extracted, into a plain ASCII, or SAS or SPSS transport file. When typing in variable names, make sure there is a blank space after each word (even when you hit return for the next line) and that the lines do not exceed 77 characters in length. GIS Guide to Good Practice (Arts and Humanities Data Service (AHDS)) This UK-based site is for those who create, maintain, use and and preserve GIS-based digital resources. Although the overall emphasis is upon archaeological data, the information presented has much wider disciplinary implications. As well as providing a source of useful generic information, the guide emphasises the processes of long-term preservation, archiving and effective data re-use. Great Britain Historical Database (Humphrey Southall, David Gilbert, Ian Gregory and History Data Service, UK Data Archive) The Great Britain Historical Database is part of a larger project called the Great Britain Historical GIS Project, online at http://www.gbhgis.org/. The Historical Database is a large integrated database of geographically-located historical statistics for Great Britain, mainly drawn from the period 1851-1939. A geographical information system (GIS) is linked to the database and permits many of the statistics to be mapped at county or more local levels. The site provides detailed documentation. Access requires online registration with the Economic and Social Data Service, and some limitations apply. Health Data for All Ages (HDAA) (U.S. National Center for Health Statistics) According to the HDAA site, "this site presents tables that provide CDC health statistics for infants, children, adolescents, adults, and older adults. You can customize tables with any or all of the following characteristics: age, gender, race/ethnicity, and geographic location." Using the Beyond 20/20 system, users can browse and manipulate tables online, or download the tables and software for additional features. Table topics include: Pregnancy and Birth; Health Conditions/Risk Factors; Health Status and Disability; Health Care Access and Use; and Mortality. Industry Concordances (Jon Haveman) A wonderful site produced by Jon Haveman that provides access to a multitude of concordances (ISIC, SITC, usSIC, cSIC, HS and others) plus a list of acronyms and what they stand for AND verbal descriptions of the various classification systems. International Household Survey Network (IHSN) (International Household Survey Network (IHSN))
The IHSN is a partnership of international organizations that aims to improve the availability and quality of household survey data in developing countries. For researchers looking for microdata, the site provides a central catalog of household surveys from developing countries, with contact information for the agencies and archives responsible for the data. When links are provided, they lead to the home page of the responsible archive; not all surveys listed are publicly available or available online. The site also provides a separate list of links to archives of survey data from developing countries.
For national statistical agencies, the IHSN site provides tools and guidelines in such areas as sampling, questionnaire design, anonymization, data archiving & dissemination. A database on planned censuses and surveys carries information about surveys that are planned or in process. A question-bank is under development, to help agencies harmonize their data collection efforts. The site also carries a Microdata Management Toolkit, developed by the World Bank Data Group, which includes a metadata editor and a CD-ROM builder tool. Some components of the toolkit are freely available, others require a license. International Survey Center: Survey Design and Statistical Analysis in Many Nations (International Survey Center, Australia) The International Survey Center conducts research on social, economic and political issues using survey data from large, representative national samples from many nations. Some of their data is freely available and is in the form of SPSS "portable" files. '2AF' Secondary Analysis Files are available from several cross-national projects. These include data that have been carefully worked over to make them comparable between nations and to make them user-friendly. Internet for Social Statistics (Robin Rice, Edinburgh University Data Library and Intute) The Internet for Social Statistics guide, which was written by Robin Rice of the Edinburgh University Data Library and substantially updated in 2006, offers a free tutorial on how to use social statistics. Users can tour sites for statistics, learn how to improve their data searching techniques, learn how to apply critical thinking skills to citing sources on the web, and reflect on how to use the web as a better tool for researching and teaching. This guide is now part of the Intute Virtual Training Suite. Inter-University Consortium for Political and Social Research (ICPSR) (Inter-university Consortium for Political and Social Research (ICPSR)) The Inter-University Consortium for Political and Social Research (ICPSR), established in 1962, maintains and provides access to a vast archive of social science data for research and instruction, and offers training in quantitative methods to facilitate effective data use. To ensure that data resources are available to future generations of scholars, ICPSR preserves data, migrating them to new storage media as changes in technology warrant. In addition, ICPSR provides user support to assist researchers in identifying relevant data for analysis and in conducting their research projects. Codebooks are freely available, and data is available for download to ICPSR member institutions (DISC holds the UW-Madison ICPSR membership). Non-UW-Madison users may access the ICPSR site at http://www.icpsr.umich.edu/. Introduction to Metadata: Pathways to Digital Information (Getty Standards Program) Version 2.1 of Introduction to Metadata: Pathways to Digital Information has been placed online in its entirety by the Getty Standards Program. The publication now offers "suite" of metadata crosswalks that map different sets of metadata. Also included are a glossary and list of hyperlinked acronyms. All sections of the book are available in both HTML and .pdf format. Malta National Statistics (National Statistical Office) The National Statistical Office (NSO) was established in 1981 by the Statistical Service Act (Chapter 386) and became the central agency in Papua New Guinea for providing statistical information to meet the needs of the Government for the formulation of policy and planning. Under Section 106 of the 1995 Reformed Organic Law on Provincial and Local Level Government, the NSO was also given the mandate to assist in creating statistical databases at the Provincial and Local Government levels for policy formulation and planning at these levels. Master Area Geographic Glossary Of Terms (MAGGOT) (University of Missouri) This document supplies useful definitions of geographic units ("geocodes") commonly used in geographic databases such as MABLE/Geocorr. Included are definitions for State, County, MCD-CCD (County Subdivisions), Place, Census Tract,
Block Group and Census Block. Medical Expenditure Panel Survey (MEPS) (American Healthcare Research and Quality) “The Medical Expenditure Panel Survey, or MEPS as it is commonly called, is the third (and most recent) in a series of national probability surveys conducted by AHRQ (American Healthcare Research and Quality) on the financing and utilization of medical care in the United States.” A number of public use files are available for download, and some data is also available in tabular format. Online statistical tools are available for analyzing household data and employer-based insurance data. National Crosswalk Service Center (NCSC) (Iowa Center for Career and Occupational Resources (ICCOR)) The National Crosswalk Service Center (NCSC) specializes in occupational and training program classifications, their relationships to each other, and to related data. A “crosswalk” allows users to interpret one classification system in terms of another. NCSC makes crosswalks available for FTP download, and also serves as a depository of other computerized occupational and educational information resources. NCES Handbook of Survey Methods (National Center for Education Statistics (NCES), U.S. Department of Education) The NCES Handbook of Survey Methods explains how the National Center for Education Statistics (NCES) obtains and prepares the data it publishes for each of its survey programs. Networked Social Science Tools And Resources (NESSTAR) (Networked Social Science Tools And Resources) NESSTAR is a system for data discovery, location, browsing and dissemination via the Internet. A web-based browser interface called NESSTAR Light lets users do simple analyses and download data that has been mounted on a NESSTAR server. Data Archives currently offering data through NESSTAR include the UK Data Archive, Norwegian Social Science Data Services (NSD), Danish Data Archive (DDA) and the Finnish Social Science Data Archive (FSD). NLS Web-Investigator (Center for Human Resource Research (CHRR), Ohio State University) The NLS Web-Investigator is a web-based interface to documentation and data from all the cohorts of the National Longitudinal Study (NLS). Like its predecessor, CHRR DB-Investigator, Web-Investigator allows users to search the database by variable name, question text, survey year and question number. Users can view the codebook information associated with variables, select and extract variables, and create a codebook unique to the variables chosen. Web-Investigator provides value labels in the statistical results files. A weighting program option lets users create a custom set of survey weights, making it easier to accurately calculate summary statistics from multiple years of data. Registered users can perform variable extractions without downloading any software or full data files, and can update and save their tag sets on the server for up to 90 days. Result files can be saved to a local computer or left in a personal NLS Web-Investigator account for up to 4 days. Open Calls for Comment on Federal Data Collections (Association of Public Data Users (APDU)) The Association of Public Data Users provides an up-to-date spreadsheet, linked from this page, of current statistical issues for public comment as announced in the notices of the Federal Register. Calls for comment are organized first by agency, then by closing date for comments. Each call for comment is linked to the Federal Register page, along with a contact person. A second page within the spreadsheet lists already-expired calls for comment. Penn World Tables (PWT) (Center for International Comparisons, University of Pennsylvania) The Penn World Table (PWT) provides purchasing power parity and national income accounts converted to international prices for 188 countries for some or all of the years 1950-2004. The PWT is hosted at the Center for International Comparisons, University of Pennsylvania. PWT 6.2 carries data up to the year 2004, with 2000 as a base; 6.1 goes up to 2000 and uses the year 1996 as a base. Both PWT 6.2 and PWT 6.1 are presented in an online, menu-based utility that allows for output in HTML tables, SAS data input, or CSV. PWT 6.1 is also available in Excel tables, and the older PWT 5.6 in self-extracting programs. Population Research Institute (PRI) (Pennsylvania State University) The Population Research Institute (PRI) at Penn State focuses on research and training in the population sciences. Their initiatives include a library with a data archive (available from the links on the right in the box labelled "For PRI Affiliates"). The data archive makes available an online data extraction engine called SodaPop, at http://sodapop.pop.psu.edu/, that allows users to create data extracts and view documentation for the datasets included in the system. A variable-search function is available within datasets, but not across the entire collection. PRI affiliates may create extracts or download datasets for any of the SodaPop holdings, while non-affiliated users may fill out an online application form to request access for files that are not restricted. Any user, however, may search the data and view the documentation. The PRI website highlights a number of studies carried out under Penn State auspices, such as the TREMIN Research Program on Women’s Health; the Puerto Rican Maternal and Infant Health Study; and the Marital Instability over the Life Course Study. ProGAMMA Data Solutions (Rijksuniversiteit Groningen, Netherlands) The ProGAMMA site, in the Dutch language, sells data software. The SIByl database of software in the social & behavioral sciences in Europe, formerly hosted by iec PROGAMMA, is no longer available. Sample Size Calculator (Creative Research Systems) This site actually contains two calculators: one for determining the necessary sample size for a given population and desired confidence interval, and one for calculating confidence interval for a given population and sample size. Scholars' Lab - Digital Resources (includes the former Geostat) (University of Virginia) The resource formerly known as Geostat at the University of Virginia library has been re-organized into the Scholar's Lab. The collections of numeric and geospatial data files, including the Internet-accessible data extraction tools, are now accessible through the Scholars' Lab digital resources page. SDMX - Statistical Data and Metadata Exchange (SDMX) According to the SDMX web site, “[T]he BIS, ECB, EUROSTAT, IMF, OECD, UN, and the World Bank have joined together to focus on business practices in the field of statistical information that would allow more efficient processes for exchange and sharing of data and metadata within the current scope of our collective activities. The goal is to explore common e-standards and ongoing standardization activities that could allow us to gain efficiency and avoid duplication of effort in our own work and possibly for the work of others in the field of statistical information.” The site describes and documents the work of the SDMX initiative. Social Research Update (University of Surrey, United Kingdom) This general reference periodical for beginning social science researchers is issued quarterly by the Department of Sociology, University of Surrey, Guildford, England. Previous issues have included such topics as Ethnographic writing, Archiving qualitative research data, and Secondary analysis of qualitative data. Social Science & Government Data Library (University of California, Berkeley)
The Social Science and Government Data Library (SSGDL) is a collaboration between the UC-Berkeley Library and UC DATA on the University of California, Berkeley campus. The SSGDL web site carries both an extraction system and FTP links for U.S. Census Data. The extraction system contains 1990 census data from SSTF1, SSTF2 (Ancestry of the Population of the US) SSTF3 (Persons of Hispanic Origin in the United States), and SSTF5 (Characteristics of Asian and Pacific Islander Population of the US). Users can pick both geographies and variables.
The FTP files available from the site include:
- Census 2000: Summary File 1 (SF1), Redistricting Data (P.L. 94-171), Race and Hispanic or Latino Summary
- 1990 Census: Congressional Districts in the U.S., Equal Employment Opportunity File, Public Law 94-171 data, Public Use Microdata Samples (PUMS) - 1% and 5% data, Summary Tape File 1B (includes PR files), Summary Tape File 3 (includes 3A, 3B, and 3C), Subject Summary Tape Files
- 1970 Census Fifth Count Special Tabulation
- County & City Databook 1988 and 1994
- Current Population Survey files between 1988 and 1993
- Economic Census Data (1987, 1992, 1997)
- TIGER/Line 1997 files
Downloaded FTP files use the "Go" extraction system. Society for Political Methodology (American Political Science Association) The Political Methodology section of the American Political Science Association offers a hyperlink to the Political Methodology Working Papers, 1995 to the present. Authors submit papers to be downloaded, abstracts are readable on the web, and the full text of the paper may be read or downloaded in PDF. STATS (The Statistical Assessment Service) The Statistical Assessment Service "examines the way that scientific, quantitative, and social research are presented by the media, and works with journalists to help them convey this material more accurately and effectively." STATS sets the record to rights in an engaging, direct manner. SuperSTAR (Space-Time Research (Melbourne, Australia) and Alta Plana Corporation) The SuperSTAR statistical tabulation suite is used for analysis and dissemination of demographic, social, survey, trade, and marketing data. The suite includes the SuperCROSS Windows module and the SuperWEB browser interface and provides integrated statistical calculations, confidentiality algorithms, charting, mapping, and data extraction. It runs against micro-data and summarized data cubes with a multi-lingual interface and support for multi-lingual metainformation. Note: this is a fee-based product. Survey Documentation and Analysis (SDA) Archive (University of California-Berkeley) SDA is a set of programs for the documentation and analysis of survey data. The programs can produce codebooks either for printing or for browsing on the World Wide Web. Data analysis programs in the package can be run in various ways, including online from a Web browser. Data available here include: GSS (General Social Survey) 1972-2004 Cumulative Datafile; NES (US National Election Study) cumulative back to 1948 plus individual years since 1996; some census microdata from the US and California; several Labor and Health surveys; and several surveys on racist attitudes and prejudice. The site also links to other data archives that use SDA. Also included at this site is information on the Data Documentation Initiative (DDI) and Instrument Documentation (IDOC), a project to develop network-browsable documentation for CAI instruments, including the SIPP. Survey Question Bank (UK Data Archive (UKDA) at the University of Essex) The Survey Question Bank is a reference source for question formats and wordings used on major social surveys in the UK. It provides supporting material on concepts and methodology, and aims to disseminate knowledge about survey data collection methods to achieve comparability of results. Swivel (Swivel)
Swivel offers a data sharing utility in the collaborative spirit of Web 2.0. As a self-proclaimed purveyor of “tasty data goodies,” Swivel’s mission is “to liberate the world's data and make it useful so new insights can be discovered and shared.” Anyone can upload data to the Swivel site, either via copy-and-paste or uploading from CSV, Excel or Google Spreadsheets. Those who upload can sort and filter, map geographical data, plot charts or graphs, describe and categorize and tag the dataset, cite the data source, and make it available for other users to comment and download. Swivel also provides an opportunity for “official” entities to certify that the data is coming from the organization that collected it. “Official” sources that have signed up so far include the OECD, the World Health Organization, the U.S. Department of Commerce, and the Newspaper Association of America.
The current Swivel site was billed as a preview edition as of November 2007. While the public edition is free, Swivel is also working on a fee-based private edition that will enable users to collaborate in a more secure environment and compare data without making it openly available.
The Center for Spatially Integrated Social Science (National Science Foundation) The CSISS site focuses on the importance of space, location, and place in social science research. The site features learning tools and bibliographies regarding GIS and social sciences, as well as a search engine and annotated links to spatial tools elsewhere on the web. In development is a data search engine intended for searching across social science data archives. The Consortium for International Earth Science Information (CIESIN) (Columbia University) The Center for International Earth Science Information Network (CIESIN) is a center within the Earth Institute at Columbia University. CIESIN works at the intersection of the social, natural, and information sciences, and specializes in on-line data and information management, spatial data integration and training, and interdisciplinary research related to human interactions in the environment. The web site features two metadata catalogs and downloadable data such as the China Dimensions data collection and the U.S. PUMA boundary files for 1990. The Higher Education Resource Institute (University of California-Los Angeles) The Higher Education Research Institute is an " interdisciplinary center for research, evaluation, information, policy studies, and research training in postsecondary education." Based in the University of California-Los Angeles Graduate School of Education and Information Studies, the HERI sponsors the CIRP survey of college freshmen as well as the CSS (College Student Survey). Sample formats of the CSS and CIRP (Cooperative Institutional Research Program) are available on the site in PDF. The site also includes links to recent research at the Institute as well as Institute publications. TranStats (U.S. Bureau of Transportation Statistics) TranStats comprises a broad collection of over 100 transportation datasets from various federal sources such as the Department of Transportation and the Census Bureau. TranStats is searchable by keyword or category. Some of the data descriptions link to data stored on other sites; for the many datasets stored at TranStats, however, users have interactive control over which variables to download, in addition to interactive analysis tools (simple statistical summaries, create time series or cross tabulations, generate graphics online, and cut/paste results into reports). A "mapping center" is also available through TranStats, carrying the National Transportation Atlas Databases (NTAD) and other transportation mapping tools. Note: TranStats was formerly known as the Intermodal Transportation Database. Trends in Health and Aging (U.S. National Center for Health Statistics) This site presents a collection of tables on trends in the health of older Americans showing data by age, sex, race and Hispanic origin. Using the Beyond 20/20 system, users can browse and manipulate tables online, or download the tables and software for additional features, including mapping and statistical tests. Tables are categorized into 19 topics: Chronic Conditions, Functional Status and Disability, Health Care Expenditures, Health Care Utilization, Health Insurance, Immunization, Incontinence, Injury, Life Expectancy, Living Arrangements, Mental Health, Mortality, Oral Health, Perceived Health Status, Population (Nation and State), Risk Factors, Socio-Economic Status, Special Equipment Use, and Use and Cost of Prescription Medication. U.S. Demography (CIESIN) Included here are informative explanations of the following datasets: Public Use Microdata Samples, Current Population Survey, Economic Census Data, County Business Patterns, County City Data Book, Statistical Abstract Supplement, National Economic Social and Environmental Databank, Regional Economic Information System, Enhanced County to County Migration 1985-1990, TIGER 1992 Boundaries, and STF3A Standard Extracts. Understanding the 1990 Public Use Microdata Sample (PUMS) (UCLA) This document presents a general overview of PUMS, while specifically discussing the distribution of census questionnaires, privacy protection, selection of PUMS 5% data, and structure of the 1990 PUMS 5% data which includes geographic, household, and person information. Unicon Website, CPS Utilities (Unicon) Unicon is the producer of CPS Utilities (which DISC has on CD-ROM for the March and October CPS) as well as other data utility products. Their web site includes a section called CPS on Web, that they describe in the following way: "At no charge, you may make tables and graphs from the CPS data, make estimations, get summaries and statistical measures, search the documentation, and make your own variables as functions of the existing ones. For a fee, you may also download data extractions." Free registration required. University of Western Ontario Data Resources Library (University of Western Ontario, Canada) The Data Resources Library at the University of Western Ontario, Canada, is part of the Faculty of Social Sciences. In addition to their catalog of holdings, the site also carries the Internet Data Library System (IDLS), at http://janus.ssc.uwo.ca/idls/. The IDLS allows users to search, view variable descriptions and other documentation, and create extracts from a database of Data Resources Library holdings that are available for download and analysis to approved users. Most datasets have access restrictions, but a few are publicly available. The variable description search function is particularly powerful, since it enables a variable search across all datasets in the collection. Virtual Economy (Institute for Fiscal Studies & Macro-Economic Modelling Centre at Warwick University) The Virtual Economy web site provides a sophisticated online model of the British economy that enables users to make a variety of changes and see their effects at both a macro and micro-level. Users can predict the effects on unemployment, growth, inflation, trade and the public finances by making changes to tax, spending and interest rates. WebCASPAR (National Science Foundation) WebCASPAR bills itself as an "integrated science and engineering resources data system." The database system is a collection of statistical data from several surveys in higher education from NSF and NCES via a web-based extraction form, allowing users to create tables (or view pre-defined tables). Includes institution-level data. Free registration is required to be able to customize the search fully. World Health Organization Statistical Information System (WHOSIS) (World Health Organization) Provides searching and browsing options for finding international health-related statistics on the WHO web site and beyond. Online databases accessible from the WHOSIS page include Core Health Indicators; Life Tables; Mortality; Tuberculosis; HIV/AIDS; Alcohol; and Global Health Atlas. The WHO data offerings are more extensive than is immediately apparent. Users may want to use the site-search function on the WHOSIS page. The site also offers what they call a WHOSIS query service, consisting of a Frequently Asked Questions document and a contact form to send a question to WHO staff. Worldmapper (University of Sheffield (UK) and University of Michigan) The Worldmapper site takes its catchphrase, “The World as You’ve Never Seen It Before,” and puts it into data-driven action, featuring cartograms that display global regions “re-sized according to the subject of interest.” A total-population world map, for example, displays India and Japan swollen to outsized proportions, while the United States looms large on the map of private spending on health-care and Southeastern Africa dominates the map of HIV prevalence. Some of the broad topics include health, education, transportation, communication, work, and housing, but the list continues to expand. Each map comes with a downloadable PDF poster and downloadable data files in Excel and OpenDoc format. The Worldmapper project is a collaboration between the University of Sheffield (UK) which hosts the site, and the University of Michigan. ZIP Code Resources Page (MABLE/Geocorr) This page describes a series of tools for helping users deal with 5-digit U.S. postal ZIP code areas. It focuses primarily on tools for linking ZIP codes to other geographies (such as counties, cities, metro areas) and to demographic information from the 1990 decennial census. Excellent explanation of the “messiness” of using the ZIP code as a geographic unit.
(Back to the Crossroads Search main page)
|