SlideShare a Scribd company logo
Apache UIMA &
Semantic Search
      Tommaso Teofili
   tommaso@apache.org
Apache UIMA - what is it?
Unstructered Information Management Architecture

Architectural Framework to manage (eventually large) volumes of
unstructered data

Former IBM Alphaworks project donated to ASF

Currently an Incubator podling ( http://incubator.apache.org/uima )

Apache UIMA is an Oasis standard ( http://www.oasis-open.org )
Apache UIMA - how?

Many pluggable reusable components (described via XML)
Analysis Engines (primitive or aggregates)
Asynchronous scaleout (JMS, Apache ActiveMQ)
Flow controllers
Type systems
Apache UIMA - what is NOT?

It’s not a semantic search tool inherently
the “Lucas example”
the semantic search package for UIMA is not open source!
( http://www.alphaworks.ibm.com/tech/uima/download )
UIMA & Semantic Search
Metadata generation engine for CM systems
Data enrichment
Linked data
Jeopardy (see http://www.research.ibm.com/deepqa/
faq.shtml#24 )
Let’s see...
RE Market Analysis & UIMA
Macpi: a real estate market analysis tool developed at DIA
   Webpipe (crawling and wrapping data)
   Apache UIMA
   Spring framework
Knowledge extraction
Extract metadata with Apache UIMA to build our search
Apache UIMA & AlchemyAPI
AlchemyAPI from Orchestr8 services wrapped as UIMA AEs
Named-entity recognition, word disambiguation
   “Barack Obama” is http://dbpedia.org/resource/Barack_Obama

Exploiting linked data
   enriching free text with DBpedia, GeoNames, Freebase URIs

Plugging with other UIMA AEs
   providing you with a reusable component to deal with Linked Data
UIMA & Semantic Search



  it’s demo time!

More Related Content

Apache UIMA and Semantic Search

  • 1. Apache UIMA & Semantic Search Tommaso Teofili tommaso@apache.org
  • 2. Apache UIMA - what is it? Unstructered Information Management Architecture Architectural Framework to manage (eventually large) volumes of unstructered data Former IBM Alphaworks project donated to ASF Currently an Incubator podling ( http://incubator.apache.org/uima ) Apache UIMA is an Oasis standard ( http://www.oasis-open.org )
  • 3. Apache UIMA - how? Many pluggable reusable components (described via XML) Analysis Engines (primitive or aggregates) Asynchronous scaleout (JMS, Apache ActiveMQ) Flow controllers Type systems
  • 4. Apache UIMA - what is NOT? It’s not a semantic search tool inherently the “Lucas example” the semantic search package for UIMA is not open source! ( http://www.alphaworks.ibm.com/tech/uima/download )
  • 5. UIMA & Semantic Search Metadata generation engine for CM systems Data enrichment Linked data Jeopardy (see http://www.research.ibm.com/deepqa/ faq.shtml#24 ) Let’s see...
  • 6. RE Market Analysis & UIMA Macpi: a real estate market analysis tool developed at DIA Webpipe (crawling and wrapping data) Apache UIMA Spring framework Knowledge extraction Extract metadata with Apache UIMA to build our search
  • 7. Apache UIMA & AlchemyAPI AlchemyAPI from Orchestr8 services wrapped as UIMA AEs Named-entity recognition, word disambiguation “Barack Obama” is http://dbpedia.org/resource/Barack_Obama Exploiting linked data enriching free text with DBpedia, GeoNames, Freebase URIs Plugging with other UIMA AEs providing you with a reusable component to deal with Linked Data
  • 8. UIMA & Semantic Search it’s demo time!

Editor's Notes

  1. Hello everybody, I am Tommaso Teofili, I am from Rome and I work in Sourcesense and I’m an Apache UIMA committer. In this talk I am going to tell you about Apache UIMA & Semantic Search.
  2. Apache UIMA stands for Unstructerd Information Management Architecture, it's an architectural framework to manage large volumes of unstructured data. It was an IBM project donated to Apache. It's currently an Incubator podling and the new 2.3.0 release is coming. It's also been approved as an Oasis standard. By the way, I became Apache UIMA committer on August.
  3. UIMA is made of components, which can be put in a pipeline to be executed. Every Apache UIMA component is described via XML descriptors. The most important of which is the Analysis Engine, that is responsible for the very analysis of documents. Each AE can be aggregated using an aggregate XML descriptor. To manage large volumes of data UIMA can scale using Asynchronous Scaleout. Each UIMA pipeline is made of AEs, one or more flow controllers and CAS consumers, the component which is responsible of treating the output (usually CASs).
  4. Apache UIMA is not inherently a semantic search tool. The semantic search tool package for UIMA is not open source. Lucas is a CAS consumer for UIMA capable of putting UIMA annotations inside Lucene indexes, a semi-semantic approach.
  5. UIMA is well suited as a metadata generation engine for CM systems. It can enrich data with annotations, entities, categories and so on, eventually linking them. For those who know the US Jeopardy game, IBM it's building a system able to answer open-domain questions against humans and it's based on UIMA-AS. We'll see two examples of the power of UIMA to boost semantic search.
  6. Here's the first one: Recently during my college studies I used UIMA to extract metadata to build semantic searches on a real estate market analysis tool called Macpi. zone statistics -> UIMA to find zones, price, filter out bad matchings frequent announces -> disambiguation, data filtering
  7. Another little demo I’m going to show is about integrating Apache UIMA and AlchemyAPI from Ochestr8. AlchemyAPI provides, you guess, services to extract Entities from text, but what we are interested in here is using the linked data, to exploit, for example DBPedia, Freebase and so on data clouds. So the power of UIMA here is again putting together in a pipeline many components, once my system learned I can easily substitute this component with a UIMA Dictionary using the ConceptMapper. I get Alchemy component then I can put the OpenCalais component, my personal dictionary of specific terms, a machine learning engine based on what UIMA extracts and so on. Here we’ll see indexing documents by concepts.
  8. Ok, let’s go to the demos!