An Overview of the 2019 ICPSR Summer Program

This webinar is scheduled on Monday, 2019 at 1 PM CST. ICPSR Summer Program staff will discuss this year’s courses, scholarship opportunities (including the expanded Diversity Initiative scholarships supported by the Bill & Melinda Gates Foundation), registration, visitor information, and more. The presentation will be followed by a Q&A session. Webinar registration is at https://attendee.gotowebinar.com/register/5916092033573660417
The ICPSR Summer Program offers rigorous, hands-on training in statistics, quantitative methods, and data analysis for students and researchers of all skill levels and backgrounds. Participants learn how to understand data and gain valuable research skills that help them to advance their education and careers. From May through August 2019, the ICPSR Summer Program will offer more than 80 courses in Ann Arbor, Michigan and other cities around the world. Registration for all courses will open in early February 2019.

Posted in Classes, Conferences & Webinars, ICPSR | Comments Off on An Overview of the 2019 ICPSR Summer Program

Immigration to the U.S., a Pew Research Center Mini-course

Mark Hugo Lopez, director of global migration and demography research at the Pew Research Center, and his team have created five short email lessons on immigration to the U.S.

1. Who are today’s U.S. Immigrants?
2. Who are legal immigrants, and how do they come to the U.S.?
3. Who are unauthorized immigrants in the U.S.?
4. What is immigration’s impact on the U.S. population?
5. What do Americans think about immigrants?

Pew Research Center is a nonpartisan and non-advocacy fact tank. It has been studying immigration for more than a decade.

https://mailchi.mp/pewresearch.org/u-s-immigration-mini-course

Posted in Classes, Conferences & Webinars | Comments Off on Immigration to the U.S., a Pew Research Center Mini-course

NCHS releases new wave of NSFG data – December 19, 2018

The National Center for Health Statistics (NCHS) announced the release of the 2015-2017 National Survey of Family Growth (NSFG) public-use data files. “The 2015-2017 data file contains data from 10,094 interviews conducted between September 2015 and September 2017 (5,554 women and 4,540 men).” The release contains three data files: female respondent, female pregnancy, and male respondent. Data files are available in ASCII format, with SAS, SPPS, and Stata programming statement files.For more information, go to:

https://www.cdc.gov/nchs/nsfg.htm

Posted in Uncategorized | Comments Off on NCHS releases new wave of NSFG data – December 19, 2018

Who Is Poor in Wisconsin? – December 18, 2018

According to the most recent Wisconsin Poverty Report, poverty is a major issue in many parts of Wisconsin, particularly in the very rural Northwest and the very urban Southeast. This report is published annually by the Wisconsin Poverty Project in the Institute for Research on Poverty (IRP) at the University of Wisconsin-Madison. It compares the Wisconsin Poverty Measure (WPM) and the market-income poverty measures to provide a nuanced picture of economic hardship in the state. The WPM uses data from the American Community Survey (ACS), the Supplemental Nutrition Assistance Program (SNAP, called FoodShare in Wisconsin), housing programs, and direct taxes, including refundable tax credits.

https://www.irp.wisc.edu/resources/who-is-poor-in-wisconsin/

Posted in Government Statistics | Comments Off on Who Is Poor in Wisconsin? – December 18, 2018

Belgian Municipal Accounts for 16 Governments, 1870-1933 – November 14, 2018

The Data and Information Services Center (DISC) has added yet another research study to its online archive this year:

Belgian Municipal Accounts for 16 Governments, 1870-1933 is a study that contains information and data about the receipts and expenditures of 16 municipal governments in Belgium during the period 1870-1933. All of the information on receipts and expenditures was taken from various volumes of the ‘Annuaire Statistique de la Belgique’. While this source provides information about both the budget (intended receipts and expenditures) and accounts (actual receipts and expenditures) of municipal governments, only the latter is included in the data file.** Information about political parties, city council seats, and college seats is also provided.

This study was initially deposited at the Data and Program Library Service (which later became DISC) in 1979 by its principal investigator, Michael T. Aiken. Since then, the data files and documentation have been available by request only.

As of November 2018, users are now able to access this study documentation (e.g. codebook, data files) online; a free registration is required in order to download data files. The codebook is available in PDF format. The dataset is available in raw ASCII, Stata, and SPSS file formats, with the corresponding command files and data dictionaries.

**Please note that data is missing for all cities from 1914 to 1918.

Posted in Data Release, DISC, DISC Archive, Government Statistics, Uncategorized | Comments Off on Belgian Municipal Accounts for 16 Governments, 1870-1933 – November 14, 2018

ICPSR Studies Are Searchable in the UW-Madison Library Catalog – November 5, 2018

The Cataloging and Metadata Services in the General Library System added 10,000 records from the Inter-University Consortium for Political and Social Research (ICPSR) to UW-Madison Library Catalog on October 16, 2018. Adding these metadata expands patrons’ search and discovery to studies in a world’s major social science data archive. When you search GLS catalog, https://www.library.wisc.edu/, studies from ICPSR will be displayed in your result page. You can follow the Online Access link to ICPSR to access a study. In Advance Search, you can specify ICPSR in Series field to only search ICPSR’s holdings in campus library catalog. Please contact DISC if you have any questions about ICPSR’s collections.

Posted in DISC, ICPSR | Comments Off on ICPSR Studies Are Searchable in the UW-Madison Library Catalog – November 5, 2018

PDF Data Conversion Tools – October 24, 2018

Have you ever found useful data in PDF or text-based document and wondered if there was a simple way to transfer it into a spreadsheet format for statistical analysis? This question was recently posed to the staff here at DISC and we were able to locate three software tools that can be used for this purpose. What follows is a brief review of each tool, along with a comparison of the three with regard to which may be the most suitable for certain user needs. (please see Table 1. at the bottom for a quick glance at the core features of these three tools).

ABBYYFineReader14 (Standard and Corporate versions)

The standard version of ABBYYFineReader 14 allows users to edit a PDF, to make comments and collaborate with others within the PDF editor, and has the capability to covert PDF’s, other documents, and scans into a variety of formats (e.g. Word, Excel, Powerpoint, etc.). The conversion feature also includes combining multiple files into one converted document. The PDF editor also allows users to add images and text, to draw, to redact data, to add pages, and much more. AbbyyFinereader 14 also includes an OCR (Optical Character Recognition) editor which allows users to customize and verify recognized text, to recognize unusual characters and fonts, and to select certain areas of a document to recognize, among other things. Within this editor, users can compare an original document (e.g. the PDF version of a data table) with the OCR’d version, correct the text that was not recognized properly, and then save the corrected document in the desired format (e.g. an Excel spreadsheet).

The corporate version of ABBYYFineReader 14 contains these features along with additional capabilities which include a side-by-side comparison of documents for the purpose of detecting differences and automated conversion of documents. The comparison feature of the corporate version does not offer editing capabilities, however, and seems to be more useful for comparing differences in content versus structure.

ABBYY offers versions of FineReader 14 for both Windows and Mac, as well as many other related software products. Pricing begins at $200 for the standard version and $399 for the corporate version; licensing is perpetual and upgrades for registered users are available. A 30-day free trial option is also offered.

PDFTables

PDFTables is a proprietary software tool that allows users to download PDF files with tabular data, to preview them on a web page, and then to convert/download the preview in Excel, CSV, or XML document format. Both cloud-based and on-premise service models are offered and a free trial option is available so that users can convert PDF files immediately via the website (https://pdftables.com/). Pricing is based on the number of pages converted. The free trial option includes 25 pages, with another 50 being made available upon completion of a free registration. PDFTables also includes an API (application program interface) which allows users to automate PDF data extraction.

Tabula

Tabula is an free, open-source software tool that was created specifically for the purpose of extracting a data table from a PDF file and converting it into spreadsheet-compatible format (e.g. CSV, TSV, JSON.) The application can be downloaded from: https://tabula.technology/ and the interface is accessible via a browser tab that opens each time the application is run. It contains a PDF viewer which displays an imported PDF file from which the data can be selected, previewed, and exported into the desired file format. Tabula is Windows, Mac, and Linux-compatible. It is only designed for text-based PDF files, not scanned documents and/or other document types. The application interface displays a history of files that the user has previously imported, and there is also the option to save custom selections as a template for future use.

The following examples display the varying capabilities of these three conversion tools:

PDF file with data table: Page11852 WhigOCRed

Example of the PDF file (which contains tabular data and has undergone optical character recognition using another program) converted into Excel spreadsheet format Using ABBYYFineReader 14: ABBYY-Page11852 WhigOCRed

An example of the same file converted using Tabula: Tabula-Page11852 WhigOCRed

And, finally, using PDFTables: PDFtables- Page11852 WhigOCRed

When comparing these examples, it is clear that the levels of precision and clarity vary depending on which tool is used, with ABBYY being the most accurate and Tabula being the least. More manual editing will be required within a spreadsheet after conversion when using PDF Tables and Tabula. The PDF viewer within Tabula is not nearly as powerful as it is within ABBYY FineReader 14, when you compare the same document opened in both programs. However, we did note that when using Tabula greater detail and precision were achieved when smaller portions of the data table within the PDF were selected for conversion at a given time. It was also somewhat beneficial to perform optical character recognition on the PDF file prior to conversion. Tabula and PDFTables do not contain this capability, so you will need to use another program first to do this. In addition, neither tool offers the side-by-side document comparison options that are found in ABBYYFineReader 14, and PDFTables is only able to convert one page of a file at a time. Overall, it seems that Tabula may be more appropriate to use when converting PDF-based data that is displayed in a simple format, when converting multiple files into the same basic format, when cost may be an issue, and/or when one needs to convert data into spreadsheet format on an infrequent basis. PDFTables may be also useful when the volume and complexity of tabular data is not an issue. On the other hand, ABBYY FineReader 14 seems to be the better tool to use if you are under time constraints in your work, need to convert a great deal of tabular data, and/or need to convert data that is presented in a more complex format.

Table 1.

Software (Developed By) ABBY FineReader 14 (ABBYY) PDF Tables (The Sensible Code Company) Tabula (Journalists Manuel Aristarán, Mike Tigas Jeremy B. Merrill, and Jason Das)
Capabilities
  • Converts PDF’s, scans, and other documents into a variety of formats (e.g. Excel, PowerPoint).
  • Combines multiple files into one converted document.
  • Contains both PDF and OCR (Optical Character Recognition) editors.
  • Corporate version also includes automated document conversion capabilities and a side-by-side document comparison feature.
  • Windows, Mac, and Linux compatible.
  • Coverts PDF’s into Excel, CSV, or XML format (CSV and XML formats are offered through the on premise service model).
  • Has an API which allows users to automate data extraction.
  • Offers both cloud-based and on-premises service models.

 

 

 

 

 

  • Coverts data tables located within a PDF file into spreadsheet-compatible format (e.g. CSV, TSV, JSON).
  • Allows for selective and/or partial conversion of data tables.
  • Windows, Mac, and Linux compatible.
  • Keeps a file history of previously imported PDF’s.
  • Offers an option to save custom selections as templates for future use.

 

 

 

Limitations
  • Side-by-side document comparison feature in the corporate version does not include editing capabilities.
  • Requires minimal manual editing after conversion.

 

 

  • Does not include optical character recognition (OCR).
  • Can only covert one page at a time.
  • Requires some manual editing after conversion.

 

 

  • Does not include optical character recognition (OCR).
  • Only designed for text-based PDF files, not scanned documents and/or other document types.
  • Requires considerable manual editing after conversion.

 

Cost
  • $200 for the standard version, $399 for the corporate version; perpetual licensing and upgrades are available for registered users.
  • Free 30-day trial option is available.

 

  • Pricing is structured according to the number of pages used (e.g. $30 for a 1000 pages).
  • Free 25-page credit trial option is available; an additional 50 pages are given with a free registration.
  • Free, open-source tool.

 

 

 

 

 

Posted in DISC, Uncategorized | Tagged , , | Comments Off on PDF Data Conversion Tools – October 24, 2018

United Nations Population Fund Release Annual Population Report – October 17, 2018

Today, the United Nations Population Fund released their annual “State of the World Population” report (.pdf format, 152p.). The theme of this year’s report is “The Power of Choice: Reproductive Rights and the Demographic Transition.” The full report can be found at:

https://www.unfpa.org/swop-2018

Posted in Online Headlines | Comments Off on United Nations Population Fund Release Annual Population Report – October 17, 2018

A Tale of Two Data Projects: Curation at the Qualitative Data Repository

Qualitative Data Repository (QDR) at Syracuse University will host a free webinar on how to curate and share qualitative research data on Wednesday, November 7th, at 12pm (EST). Attendees can learn about two data projects deposited with QDR, from the initial contact, through a variety of curation steps, and to their eventual publication. It will cover the challenges posed by sharing and publishing qualitative data. Please register at https://www.eventbrite.com/e/a-tale-of-two-data-projects-curation-at-the-qualitative-data-repository-tickets-51458200864.

Posted in Classes, Conferences & Webinars | Comments Off on A Tale of Two Data Projects: Curation at the Qualitative Data Repository

Financial Characteristics of Cities in the United States, 1905-1930 – September 5, 2018

The following research study is now available in the online archive of the Data and Information Services Center (DISC):

Financial Characteristics of Cities in the United States, 1905-1930 (Christopher Curran, Principal Investigator)

Data was collected in the summer of 1969 from volumes of the following U.S. Bureau of the Census publications: Statistics of Cities Having a Population Over 30,000 and Financial Statistics of Cities Having a Population Over 30,000. Statistics were then compiled in order to describe the pattern of financial transactions in U.S. cities for the period 1905-1930, as part of an examination of regional and population differences.

Variables in this research study include: population, non-government costs, interest charges, public service enterprise payments, general government expenses, protection expenses, health expenses, sanitation expenses, highway expenses, charity, hospital and correction expenses, education expenses, recreation expenses, miscellaneous expenses, general government outlays, health outlays, sanitation outlays, charity, hospital, and correction outlays, education outlays, recreation outlays, miscellaneous outlays, public service enterprise outlays, non-revenue receipts, general property taxes, special taxes, poll taxes, business taxes, special assessments, fines, forfeits, and escheats, subventions and grants, highway privileges, rent revenue, interest revenue, miscellaneous revenues, public service enterprise revenue.

This study was initially deposited at the Data and Program Library Service (which later became DISC) in 1979 by Christopher Curran, an economics professor at Emory University in Atlanta, Georgia. Since then, data files and documentation for the study have been available by request only.

As of September 2018, users are now able to access all study documentation (e.g. codebook, data files) online; a free registration is required in order to download data files. The codebook is available in PDF format. The dataset is available in raw ASCII, Stata, and SPSS file formats, with the corresponding command files and data dictionaries.

Posted in DISC, DISC Archive, US Census | Comments Off on Financial Characteristics of Cities in the United States, 1905-1930 – September 5, 2018