FileMap: File-Based Map-Reduce

FileMap is a lightweight system for applying Unix-style file processing tools to large amounts of data stored in files. It provides full map-reduce functionality without requiring that you switch your processing to any particular language or runtime environment, install any special software, or have root on your storage and processing nodes.

Features

Usage

Example: Compute word frequencies in a text corpus. FileMap stores the files across a set of machines and executes the pipeline in parallel. Word list are divided up across the nodes and tallied in parallel:

$ fm store *.txt /etext/

$ fm map -i "/etext/*" "sed -f words.sed | fm split -n 9 |> sort | uniq -c"

Read more about this example and FileMap in the Wiki.

Installation

No installation by a priveleged user is requires. Just download the fm script and create a filemap.conf file describing your environment. You will need python (≥ 2.4), ssh, rsync, and bash

You can download this project in either zip or tar formats. You can browse the source, clone it with Git, and publish changes at GitHub.

Changelog

Reporting Problems

Issue tracking is hosted at Google Code.