Thursday, December 30, 2004

Unstructered Information Management

UIM is something that IBM has been doing a lot of research and development on.  I caught this reference from an article in the NY Times (http://www.nytimes.com/2004/12/26/business/yourmoney/26techno.html) entitled “At I.B.M., That Google Thing Is So Yesterday”.  This article refers to some work being done under the direction of Arthur Ciccolo within IBM Research (http://www.research.ibm.com/UIMA/index.htm) at the T.J. Watson Research Center

 

Their goal is to create an infrastructure that provides the ability to combine the various information extraction and knowledge discovery techniques in an effective fashion – rather than each one having to re-process the information, is there a way to leverage each of these using some infrastructure and get better throughput and results?  I’ve only begun to dig through their publications, but they recently had an entire IBM Systems Journal issue dedicated to this area of research (http://www.research.ibm.com/journal/sj43-3.html).

 

Kipp Jones

 

Wednesday, December 22, 2004

I'd like to explore the idea behind CyberINFOstructure a bit further. How is this different from the CyberINFRAstructure? What is meant by this distinction?

To date, a lot of high performance computing and the related infrastructure is around computing -- cpu, performance, bandwitdth and throughput. The focus seems to be on the bottom layer, the infrastrcuture if you will.

The information that lives on top of this has been given some cycles, but generally in a specific manner related to a problem at hand. I believe there needs to be a more concerted effort to understand and create a more reliable infostrcture on top of this infrastructure. This layer should support:

  • Information and source discovery
  • Information pedigree
  • Information access
  • Information classification and semantics
  • Information composition
  • Information translation
These capabilities reside on top of the basic infrastructure and provide common facilities for applications to access and use information that resides within the grid/enterprise/world.

I'll dig deeper into each of these topics next.


Tuesday, December 21, 2004

Focus on Research

Key topics of interest include:
  • Web Technologies (web services, semantic web, search, retrieval, etc.)
  • Information Quality and Availability (get it when you need it, proper semantics, verifiable pedigree, etc.)
  • Cyberinfrastructure (or as I prefer cyberINFOstructure -- a focus on the infrastructure that provides access to the inforamation rather than the comptuing side of the infrastructure)

I intend to build on and improve these 3 pillars to understand how they can be applied towards two areas:
  • Enterprise Computing
  • Scientific Computing
I believe there are some fundamental process, algorithms, and capabilities that can be created to provide support for both of these activities. Clearly each has some specific needs that will be different from the other and we may need to go down one path quite a ways to fully undertand the particular needs, but ultimately I'd like to circle back to find the commonalities across these domains.

The next step is to identify and isolate relevant research related to these topics and begin to distill it into a more focused set of ideas.