POLMETH Archives

Political Methodology Society

POLMETH@LISTSERV.WUSTL.EDU

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Stephen Purpura <[log in to unmask]>
Reply To:
Political Methodology Society <[log in to unmask]>
Date:
Mon, 24 Jul 2006 13:43:22 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (233 lines)
Hi,

Given the interest in the recent PolMeth paper on text clustering by Kevin
Quinn, Burt Monroe, Michael Colares, and Michael Crespin, you might also be
interested in hearing that an {R}-like project to build a more comprehensive
computer-assisted content analysis tool for use by researchers is now in the
works.

Expect more news later in the year about funding and project specifics, but
in the interim you can forward the email below in its entirety.

Regards,


Stephen Purpura
Master of Public Policy Candidate
John F. Kennedy School of Government
Harvard University
email: [log in to unmask]
phone: +1-617-314-2027
Skype: steveatksg


        -----Original Message-----
        From: Content Analysis News and Discussion
        To: [log in to unmask]
        Sent: 7/7/2006 3:39 PM
        Subject: [CONTENT] "Text commons" open source content analysis
platform

        Hello, all;

        The foundation for which I work is investigating the possibility of
        investing in a large-scale project in the field of text/content
        analysis. In order to inform our decision process as we consider
whether
        to proceed with this project, I would like to ask two questions of
this
        community.  I would suggest that we discuss the first by means of
this
        list, and that interested parties answer the second by contacting me

        directly, offline, using the information at the bottom of the
message.

        Before I ask my questions, I must stress that this is a preliminary
        inquiry, and that no firm decisions have been reached. I am seeking
        information, not proposals (the foundation does not accept
unsolicited
        proposals).

        Having said that, let me give you some background. Briefly, we are
        considering funding a comprehensive series of extensions to the
NCSA/ALG
        "T2K" ("Text to Knowledge") project
        (http://alg.ncsa.uiuc.edu/do/tools/d2k), which is an open-source,
        Java-based platform for the large-scale mining, analysis, and
        visualization of text data. These extensions would give T2K the
ability
        to serve the needs of traditional content-analytical and text-markup

        communities, in addition to the text mining community that it
already
        serves. Our two hopes are:

        (a) To create an open-source "text commons" that can provide a
universal
        focus for freely available, text-analysis-related R&D in much the
same
        way that the R project (www.r-project.org) has coordinated the
research
        activities of large numbers of academics, across many disciplines,
who
        are united by a common interest in quantitative data analysis; and

        (b) To supplement the existing T2K tools (which are heavily focused
on
        large-scale, quantitative text mining, of a type that has not
        historically been of much interest to the CONTENT community) with a
        series of tools aimed at supporting the analyst who pursues a more
        human-involvement-intensive analytical strategy (i.e., very much the

        typical member of the CONTENT community). Think of these tools as
        plug-ins to T2K, much the way you can plug-in tools to Internet
Explorer
        or Firefox in order to add capability, but obviously much more
powerful.
        They might handle (e.g.): easy preparation and formatting of text
for
        different analytical purposes; the intensive markup of canonical
texts
        by literary researchers; emergent coding of observational,
interview, or
        focus-group transcripts for anthropologists, sociologists, or market

        researchers; analysis of open-ended survey questions; media content
        analysis (including screen-scraping of TV data as well as more
        'traditional' text modes); Internet-focused text research; graphical

        visualization of analytical relationships encoded in the text; and
other
        text-related analytical activities that are currently or
traditionally
        under-served in terms of open-source software.

        As a fringe benefit, we hope to bring the quantitative and
qualitative
        text communities closer together--making it easy for a CONTENT
member to
        investigate (e.g.) latent semantic analysis, for example, and just
as
        easy for a machine-learning specialist to experiment with (e.g.)
        anthropological coding techniques. We are also committed to
providing
        open-source alternatives in text commons, in order to encourage
broader
        access to sophisticated text tools and to encourage greater
        collaborative, scholarly engagement with the development of future
tools
        and algorithms to enhance the commons.

        With that background information, here are my questions:

        1) What kinds of issues and concerns should we be thinking about as
we
        discuss whether and how to proceed with this project? Does this
project
        idea appeal? If so, what excites you? If not, what repels you? I am
less
        interested in simple go/no-go answers than I am in the advice of
this
        community as to what kinds of tools and capabilities would be of
        greatest or least interest and most or least immediate value?

        2) Is there an individual or group at your (not-for-profit)
        organization--or do you know an individual or group in a
not-for-profit
        context elsewhere--who is already working or has worked on the
        development of text analysis tools and is amenable to an open-source

        development model?  If so, I would like to hear about (and from)
those
        individuals and groups. We have already contacted several humanities

        institutes and educational technology centers at various
institutions,
        in search of potential tool-makers, but I would like to cast a
broader
        net and make sure we have a chance to speak with anyone who might
        possibly be a tool-provider for the text commons project. We want to

        know what people have done, are doing, and plan to do, so we can
plan
        accordingly as we move forward.

        Many thanks in advance for your input. I am reachable at the contact

        information below; however, I am traveling for the next few weeks,
so it
        may take me a day or two to respond to queries.

        As a long-time subscriber to CONTENT, I'm excited by the chance to
be a
        part of a project like this. I hope you share my enthusiasm, but if
not,
        I would certainly like to understand why. You are welcome to share
this
        information request with anyone you like--but again, please be sure
to
        clarify that this is a request for information, not a request for
        proposals. If and when we decide to proceed, I will be delighted to
        announce the news on this list.

        Best regards,  --Chris Mackie

        Christopher J. Mackie
        Associate Program Officer
        Program in Research in Information Technology
        The Andrew W. Mellon Foundation
        282 Alexander Rd.
        Princeton, NJ 08540
        609-924-9424
        646-274-6351 (fax)
        [log in to unmask]
        http://rit.mellon.org

        ---------------------------------------------------------
        CONTENT is the Internet mailing list for news and discussion of
content
        analysis. For additional information (including information
regarding
        "signoff" procedures), visit the Content Analysis Resources web
site, at
        http://www.car.ua.edu.

________________________________

        CONTENT is the Internet mailing list for news and discussion of
content analysis. For additional information (including information
regarding "signoff" procedures), visit Content Analysis Resources
<http://www.gsu.edu/car> .

________________________________

CONTENT is the Internet mailing list for news and discussion of content
analysis. For additional information (including information regarding
"signoff" procedures), visit Content Analysis Resources
<http://www.gsu.edu/car> .
________________________________

CONTENT is the Internet mailing list for news and discussion of content
analysis. For additional information (including information regarding
"signoff" procedures), visit Content Analysis Resources
<http://www.gsu.edu/car> .

---------------------------------------------------------
CONTENT is the Internet mailing list for news and discussion of content
analysis. For additional information (including information regarding
"signoff" procedures), visit the Content Analysis Resources web site, at
http://www.car.ua.edu.

**********************************************************
             Political Methodology E-Mail List
        Editor: Karen Long Jusko <[log in to unmask]>
**********************************************************
        Send messages to [log in to unmask]
  To join the list, cancel your subscription, or modify
           your subscription settings visit:

          http://polmeth.wustl.edu/polmeth.php

********************************************************** 

ATOM RSS1 RSS2