Hi,
Given the interest in the recent PolMeth paper on text clustering by Kevin
Quinn, Burt Monroe, Michael Colares, and Michael Crespin, you might also be
interested in hearing that an {R}-like project to build a more comprehensive
computer-assisted content analysis tool for use by researchers is now in the
works.
Expect more news later in the year about funding and project specifics, but
in the interim you can forward the email below in its entirety.
Regards,
Stephen Purpura
Master of Public Policy Candidate
John F. Kennedy School of Government
Harvard University
email: [log in to unmask]
phone: +1-617-314-2027
Skype: steveatksg
-----Original Message-----
From: Content Analysis News and Discussion
To: [log in to unmask]
Sent: 7/7/2006 3:39 PM
Subject: [CONTENT] "Text commons" open source content analysis
platform
Hello, all;
The foundation for which I work is investigating the possibility of
investing in a large-scale project in the field of text/content
analysis. In order to inform our decision process as we consider
whether
to proceed with this project, I would like to ask two questions of
this
community. I would suggest that we discuss the first by means of
this
list, and that interested parties answer the second by contacting me
directly, offline, using the information at the bottom of the
message.
Before I ask my questions, I must stress that this is a preliminary
inquiry, and that no firm decisions have been reached. I am seeking
information, not proposals (the foundation does not accept
unsolicited
proposals).
Having said that, let me give you some background. Briefly, we are
considering funding a comprehensive series of extensions to the
NCSA/ALG
"T2K" ("Text to Knowledge") project
(http://alg.ncsa.uiuc.edu/do/tools/d2k), which is an open-source,
Java-based platform for the large-scale mining, analysis, and
visualization of text data. These extensions would give T2K the
ability
to serve the needs of traditional content-analytical and text-markup
communities, in addition to the text mining community that it
already
serves. Our two hopes are:
(a) To create an open-source "text commons" that can provide a
universal
focus for freely available, text-analysis-related R&D in much the
same
way that the R project (www.r-project.org) has coordinated the
research
activities of large numbers of academics, across many disciplines,
who
are united by a common interest in quantitative data analysis; and
(b) To supplement the existing T2K tools (which are heavily focused
on
large-scale, quantitative text mining, of a type that has not
historically been of much interest to the CONTENT community) with a
series of tools aimed at supporting the analyst who pursues a more
human-involvement-intensive analytical strategy (i.e., very much the
typical member of the CONTENT community). Think of these tools as
plug-ins to T2K, much the way you can plug-in tools to Internet
Explorer
or Firefox in order to add capability, but obviously much more
powerful.
They might handle (e.g.): easy preparation and formatting of text
for
different analytical purposes; the intensive markup of canonical
texts
by literary researchers; emergent coding of observational,
interview, or
focus-group transcripts for anthropologists, sociologists, or market
researchers; analysis of open-ended survey questions; media content
analysis (including screen-scraping of TV data as well as more
'traditional' text modes); Internet-focused text research; graphical
visualization of analytical relationships encoded in the text; and
other
text-related analytical activities that are currently or
traditionally
under-served in terms of open-source software.
As a fringe benefit, we hope to bring the quantitative and
qualitative
text communities closer together--making it easy for a CONTENT
member to
investigate (e.g.) latent semantic analysis, for example, and just
as
easy for a machine-learning specialist to experiment with (e.g.)
anthropological coding techniques. We are also committed to
providing
open-source alternatives in text commons, in order to encourage
broader
access to sophisticated text tools and to encourage greater
collaborative, scholarly engagement with the development of future
tools
and algorithms to enhance the commons.
With that background information, here are my questions:
1) What kinds of issues and concerns should we be thinking about as
we
discuss whether and how to proceed with this project? Does this
project
idea appeal? If so, what excites you? If not, what repels you? I am
less
interested in simple go/no-go answers than I am in the advice of
this
community as to what kinds of tools and capabilities would be of
greatest or least interest and most or least immediate value?
2) Is there an individual or group at your (not-for-profit)
organization--or do you know an individual or group in a
not-for-profit
context elsewhere--who is already working or has worked on the
development of text analysis tools and is amenable to an open-source
development model? If so, I would like to hear about (and from)
those
individuals and groups. We have already contacted several humanities
institutes and educational technology centers at various
institutions,
in search of potential tool-makers, but I would like to cast a
broader
net and make sure we have a chance to speak with anyone who might
possibly be a tool-provider for the text commons project. We want to
know what people have done, are doing, and plan to do, so we can
plan
accordingly as we move forward.
Many thanks in advance for your input. I am reachable at the contact
information below; however, I am traveling for the next few weeks,
so it
may take me a day or two to respond to queries.
As a long-time subscriber to CONTENT, I'm excited by the chance to
be a
part of a project like this. I hope you share my enthusiasm, but if
not,
I would certainly like to understand why. You are welcome to share
this
information request with anyone you like--but again, please be sure
to
clarify that this is a request for information, not a request for
proposals. If and when we decide to proceed, I will be delighted to
announce the news on this list.
Best regards, --Chris Mackie
Christopher J. Mackie
Associate Program Officer
Program in Research in Information Technology
The Andrew W. Mellon Foundation
282 Alexander Rd.
Princeton, NJ 08540
609-924-9424
646-274-6351 (fax)
[log in to unmask]
http://rit.mellon.org
---------------------------------------------------------
CONTENT is the Internet mailing list for news and discussion of
content
analysis. For additional information (including information
regarding
"signoff" procedures), visit the Content Analysis Resources web
site, at
http://www.car.ua.edu.
________________________________
CONTENT is the Internet mailing list for news and discussion of
content analysis. For additional information (including information
regarding "signoff" procedures), visit Content Analysis Resources
<http://www.gsu.edu/car> .
________________________________
CONTENT is the Internet mailing list for news and discussion of content
analysis. For additional information (including information regarding
"signoff" procedures), visit Content Analysis Resources
<http://www.gsu.edu/car> .
________________________________
CONTENT is the Internet mailing list for news and discussion of content
analysis. For additional information (including information regarding
"signoff" procedures), visit Content Analysis Resources
<http://www.gsu.edu/car> .
---------------------------------------------------------
CONTENT is the Internet mailing list for news and discussion of content
analysis. For additional information (including information regarding
"signoff" procedures), visit the Content Analysis Resources web site, at
http://www.car.ua.edu.
**********************************************************
Political Methodology E-Mail List
Editor: Karen Long Jusko <[log in to unmask]>
**********************************************************
Send messages to [log in to unmask]
To join the list, cancel your subscription, or modify
your subscription settings visit:
http://polmeth.wustl.edu/polmeth.php
**********************************************************
|