Dryad This is a research prototype of the Dryad and DryadLINQ data-parallel processing frameworks running on Hadoop YARN. Dryad utilizes cluster services provided as part of Hadoop YARN to reliably execute distributed computations on a cluster of computers. DryadLINQ provides the LINQ programming model for distributed data processing and leverages Dryad for reliable execution. Dryad and DryadLINQ on YARN are still under active development. As a result, you should expect some fragility. Requirements A version of YARN built for Windows The BUILDING.txt file in the Hadoop YARN repository contains instructions on building YARN for Windows. Visual Studio 2012 Java Development Kit 1.6 A Windows YARN cluster composed of x64 machines Building Dryad 1) Clone the Dryad git repository. 2) Ensure that YARN_HOME environment variable is set. 3) Set the DRYAD_HOME environment variable to binary path (bin\Debug or bin\Release) under the directory Dryad was cloned to. 4) Use Visual Studio to open The Dryad solution file (Dryad.sln) located in the root of the repository and build the solution. 5) Run Build.bat in the Java directory at the top-level of the repository. The CLASSPATH will need to be set to the output of the 'yarn classpath' command. Cluster setup 1) Setup your YARN cluster as you normally would. 2) Copy the contents of the DRYAD_HOME directory to the location set by DRYAD_HOME on each compute node in the cluster. Notes The YARN interfaces used are current as of commit dfb83b8 in trunk. If you are running debug builds of the Dryad, also copy the files msvcp110d.dll and msvcr110d.dll to the DRYAD_HOME directory on each compute node. The article at http://msdn.microsoft.com/en-us/library/vstudio/aa985618.aspx describes how to do this. The HDFS implementation in Dryad currently only supports text files.