» tagged pages
» logout

sorted by: recent | see : popular
Content Tagged with Information-Retrieval + webir

Focused crawler - Combine System Homepage

"Combine is an open system for crawling [harvesting and threshing (indexing)] Internet resources. It can be used both as a general and focused crawler."

open-source: del.icio.us tag/open-source

Heritrix - Home Page

"Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project."

open-source: del.icio.us tag/open-source

WIRE (Web Information Retrieval Environment)::Center for Web Research

"The WIRE project is an effort started by the Center for Web Research for creating an application for information retrieval, designed to be used on the Web."

open-source: del.icio.us tag/open-source

Welcome to Nutch!

"Nutch is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc."

open-source: del.icio.us tag/open-source