» tagged pages
» logout

sorted by: recent | see : popular
Content Tagged with mapreduce + opensource

Cascading

Dataflow language, in Java, on top of Hadoop. Via Simon Willison.

opensource: del.icio.us tag/opensource

Welcome to Hama project

行列演算をMapReduce上で行うライブラリ

opensource: del.icio.us tag/opensource

happy - Google Code

write your bulk processing jobs in python (well, jython) and then run them on your cloud via mapreduce. Perfect for those bulk content analysis jobs. or for running freebase.com, which is apparently what they do with it.

opensource: del.icio.us tag/opensource

MapReduce programming with Apache Hadoop - Java World

könnte das eine Möglichkeit sein XACML Policies schnell zu verarbeiten? Suchalgorithmen?

opensource: del.icio.us tag/opensource

Cascading

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on a Hadoop cluster.

opensource: del.icio.us tag/opensource

Welcome to Hama project

Hama (means a hippopotamus in Korean) is a parallel matrix computational package, which provides an library of matrix operations for the large-scale processing development environment and Map/Reduce framework for the large-scale Numerical Analysis and Data Mining

opensource: del.icio.us tag/opensource

happy - Google Code

Pythonを使ってMapReduceを処理するフレームワーク

opensource: del.icio.us tag/opensource

Disco

Disco is an open-source implementation of the Map-Reduce framework for distributed computing.

opensource: del.icio.us tag/opensource

Cascading

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on a Hadoop cluster.

opensource: del.icio.us tag/opensource

Disco - Massive Data, Minimal Code

Disco is an open-source implementation of the Map-Reduce framework for distributed computing. As the original framework, Disco supports parallel computations over large data sets on unreliable cluster of computers. The Disco core is written in Erlang, a functional language that is designed for building robust fault-tolerant distributed applications. Users of Disco typically write jobs in Python, which makes it possible to express even complex algorithms or data processing tasks often only in tens of lines of code. This means that you can quickly write scripts to process massive amounts of data. Disco was started at Nokia Research Center as a lightweight framework for rapid scripting of distributed data processing tasks. This far Disco has been succesfully used, for instance, in parsing and reformatting data, data clustering, probabilistic modelling, data mining, full-text indexing, and log analysis with hundreds of gigabytes of real-world data.

opensource: del.icio.us tag/opensource

Page 1 | Next >>