POLMETH Archives

Political Methodology Society

POLMETH@LISTSERV.WUSTL.EDU

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Paul Johnson <[log in to unmask]>
Reply To:
Political Methodology Society <[log in to unmask]>
Date:
Fri, 6 Oct 2006 14:33:48 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (79 lines)
Benjamin Freeman wrote:
> I am having some difficulty working with the IMF Government Finance
> Statistics Data from the ICPSR Study No. 8624.  Specifically, I do not know
> exactly how to unpack the main data file.  Has anyone written code to unpack
> this file?  Or, does anyone know of an alternative source for this data that
> does not require unpacking?
>
You need to be more specific with your question.  I don't have any
trouble with "unpacking".

$ unzip 6279173.zip
Archive:  6279173.zip
 extracting: 6279173/ICPSR_08624/08624-manifest.txt
 extracting: 6279173/ICPSR_08624/08624-related_literature.txt
 extracting: 6279173/ICPSR_08624/08624-descriptioncitation.pdf
  inflating: 6279173/ICPSR_08624/DS0001/08624-0001-Codebook.pdf
  inflating: 6279173/ICPSR_08624/DS0001/08624-0001-Data.txt
  inflating: 6279173/ICPSR_08624/DS0002/08624-0002-Data.txt

Now, concerning the file 6279173/ICPSR_08624/DS0001/08624-0001-Data.txt,
I do see trouble.  It is a block of characters I don't recognize.  It is
certainly not encoded in ASCII or Unicode.  My colleagues here suspect
it might be encoded EBCDIC character set, but my first guess was that it
was edited by somebody in a program that was popular in 1998, say "Word
Perfect" or such, and that person who saved it forgot to save into plain
text format.  I don't think it is EBCDIC, because:

$ recode EBCDIC..MSDOS test.txt
recode: test.txt failed: Invalid input in step `ANSI_X3.4-1968..ISO-8859-1'

Ah, then back to read the codebook.  I see this comment "The data are
stored in packed zoned decimal format. A supplemental COBOL processing
program is available for use with this dataset." The COBOL program
called Funpack is provided in the directory DS0002, and my guess is that
one needs a COBOL compiler to run this program.  Here are the first few
lines, in case you need to be convinced.

  EBCDIC LINE
000010 IDENTIFICATION DIVISION.
000020 PROGRAM-ID. 'FUNPACK'.
000030 AUTHOR. KATHLEEN NELICK.
000040 INSTALLATION. IMF BUREAU OF STATISTICS.
000050 DATE-WRITTEN. FEBRUARY 1972.
000060 DATE-COMPILED.
000070*REMARKS. REFORMAT PACKED IFS DF RECORD TO INTERNAL DF RECORD.

I tracked down a couple of COBOL compilers, but could not make any
progress compiling that code.

As I google about in the internet, I gather that several IMF datasets
are in this format and several people have asked what to do about them.
I don't have more time to spend on this, but if I were you the first
thing I would try is PROC DATASOURCE in SAS, which claims it can handle
IMF files.

Good luck.  I think if you are more clear about what goes wrong,  you
are probably more likely to get useful answers.  When you leave us to
track down what dataset you are using and then guess what might be going
wrong, you are asking an awful lot.

--
Paul E. Johnson                       email: [log in to unmask]
Dept. of Political Science            http://pj.freefaculty.org
1541 Lilac Lane, Rm 504
University of Kansas                  Office: (785) 864-9086
Lawrence, Kansas 66044-3177           FAX: (785) 864-5700

**********************************************************
             Political Methodology E-Mail List
        Editor: Karen Long Jusko <[log in to unmask]>
**********************************************************
        Send messages to [log in to unmask]
  To join the list, cancel your subscription, or modify
           your subscription settings visit:

          http://polmeth.wustl.edu/polmeth.php

********************************************************** 

ATOM RSS1 RSS2