» tagged pages
» logout

sorted by: recent | see : popular
Content Tagged with prodei + Software

Disco

"Disco is an open-source implementation of the Map-Reduce framework for distributed computing. As the original framework, Disco supports parallel computations over large data sets on unreliable cluster of computers."

open-source: del.icio.us tag/open-source

SourceForge.net: Information Retrieval Toolkit

"High-performance software for information retrieval research. Emphasis on semi-structured text retrieval, especially for HTML and XML."

open-source: del.icio.us tag/open-source

MG4J: Managing Gigabytes for Java_

"MG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java."

open-source: del.icio.us tag/open-source

Jeff's Search Engine Caffè: Current Open Source Search Engine Libraries

"Here is my short list of the most important open source [free] information retrieval libraries being used today that are undergoing active development as of writing."

open-source: del.icio.us tag/open-source

Hypertable: An Open Source, High Performance, Scalable Database

"Hypertable is an open source project based on published best practices and our own experience in solving large-scale data-intensive tasks."

open-source: del.icio.us tag/open-source

COSIN - WP5 - index

"The main topic of the COSIN project is to develop a series of theoretical, graphical, analytical and computational tools to describe the complex behaviour of networks."

open-source: del.icio.us tag/open-source

Hbase - Lucene-hadoop Wiki

"Bigtable-like structured storage for Hadoop HDFS"

open-source: del.icio.us tag/open-source

Grub's Distributed Web Crawling Project

"Grub started back in 2000 with a simple concept of distributing part of the search process pipeline: crawling."

open-source: del.icio.us tag/open-source

Texmaker : Free LaTeX Editor

"Texmaker is a free LaTeX editor, that integrates many tools needed to develop documents with LaTeX, in just one application. Texmaker runs on unix, macosx and windows systems and is released under the GPL license ."

open-source: del.icio.us tag/open-source

JoBo

"JoBo is a simple program to download complete websites to your local computer. Internally it is basically a web spider. he main advantage to other download tools is that it can automatically fill out forms [...] and also use cookies for session handling.

open-source: del.icio.us tag/open-source

WebCAT :: A Web Content Analysis Tool

"WebCAT is an extensible tool to extract meta-data and generate RDF descriptions from existing Web documents."

open-source: del.icio.us tag/open-source

WebLA :: Web Linkage Analysis

"WebLA is a Java package for handling Web Graphs, implementing popular algorithms such as PageRank, HITS, CoCitation Similarity and SimRank. It is of particular interest for research in Information Retrieval, [...]"

open-source: del.icio.us tag/open-source

webgraph++

"Webgraph++: big graph, little footprint"

open-source: del.icio.us tag/open-source

The Boost Graph Library

"Part of the Boost Graph Library is a generic interface that allows access to a graph's structure, but hides the details of the implementation."

open-source: del.icio.us tag/open-source

Swish-e :: Home Page

"Swish-e is a fast, flexible, and free open source system for indexing collections of Web pages or other files. Swish-e is ideally suited for collections of a million documents or smaller."

open-source: del.icio.us tag/open-source

The Zettair Search Engine

"Zettair allows you to index and search HTML (or TREC) collections. It has been designed for simplicity as well as speed and flexibility, and its primary feature is the ability to handle large amounts of text."

open-source: del.icio.us tag/open-source

The Clair Library

"The Clair library is written in Perl and is intended to simplify a number of generic tasks in Natural Language Processing (NLP), Information Retrieval (IR), and Lexical Network Analysis. Its architecture also allows for external software to be plugged in

open-source: del.icio.us tag/open-source

TagSoup home page

"This is the home page of TagSoup, a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild [...]"

open-source: del.icio.us tag/open-source

Wayback - Home Page

"wayback is an open source java implementation of the The Internet Archive Wayback Machine."

open-source: del.icio.us tag/open-source

wera - Home Page

"WERA (Web ARchive Access) is a freely available solution for searching and navigating archived web document collections."

open-source: del.icio.us tag/open-source

Internet Archive ARC access tools - Home Page

"This is home for Internet Archive ARC file access tools."

open-source: del.icio.us tag/open-source

TCatNG Toolkit :: Text Categorization via N-Grams

"The TCatNG Toolkit is a Java package that you can use to apply N-Gram analysis techniques to the process of categorizing text files. [Namely] categorizing documents by topic, detecting the author of a text, or recognizing the language [...]"

open-source: del.icio.us tag/open-source