Dryad/html/3596a79f-0714-43b0-b49a-ea9...

32 lines
7.5 KiB
HTML
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<html xmlns:MSHelp="http://msdn.microsoft.com/mshelp" xmlns:mshelp="http://msdn.microsoft.com/mshelp"><head><link rel="SHORTCUT ICON" href="./../icons/favicon.ico" /><style type="text/css">.OH_CodeSnippetContainerTabLeftActive, .OH_CodeSnippetContainerTabLeft,.OH_CodeSnippetContainerTabLeftDisabled { }.OH_CodeSnippetContainerTabRightActive, .OH_CodeSnippetContainerTabRight,.OH_CodeSnippetContainerTabRightDisabled { }.OH_footer { }</style><link rel="stylesheet" type="text/css" href="./../styles/branding.css" /><link rel="stylesheet" type="text/css" href="./../styles/branding-en-US.css" /><style type="text/css">
body
{
border-left:5px solid #e6e6e6;
overflow-x:scroll;
overflow-y:scroll;
}
</style><script src="./../scripts/branding.js" type="text/javascript"><!----></script><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Running a DryadLINQ job on HDInsight</title><meta name="Language" content="en-us" /><meta name="Microsoft.Help.Id" content="3596a79f-0714-43b0-b49a-ea9eeccb7326" /><meta name="Description" content="The process for running a DryadLINQ application on HDInsight 3.0 is a bit complicated. This is because HDInsight does not expose all of the &quot;raw&quot; Hadoop 2.2 protocols to clients outside the cluster." /><meta name="Microsoft.Help.ContentType" content="How To" /><meta name="BrandingAware" content="'true'" /><meta name="SelfBranded" content="true" /></head><body onload="onLoad()" class="primary-mtps-offline-document"><div class="OH_outerDiv"><div class="OH_outerContent"><table class="TitleTable"><tr><td class="OH_tdTitleColumn">Running a DryadLINQ job on HDInsight</td><td class="OH_tdRunningTitleColumn">DryadLINQ documentation</td></tr></table><div id="mainSection"><div id="mainBody"><span class="introStyle"></span><div class="introduction"><p>The process for running a DryadLINQ application on HDInsight 3.0 is a bit complicated. This is because
HDInsight does not expose all of the "raw" Hadoop 2.2 protocols to clients outside the cluster. In particular,
the only way to launch a job on a cluster is using the <a class="mtps-external-link" href="http://people.apache.org/~thejas/templeton_doc_latest/index.html" target="_blank">Templeton</a> REST APIs, as nicely wrapped up in the <a class="mtps-external-link" title="Optional alternate text" href="http://hadoopsdk.codeplex.com/" target="_blank">Microsoft .NET SDK for Hadoop</a>. Unfortunately, right now Templeton does not support native YARN applications like DryadLINQ, and so
the only jobs that may be launched from outside the cluster are Hadoop 1 jobs (MapReduce, HIVE, Pig, and so on).
</p></div><h3 class="procedureSubHeading">What happens when your client program runs a job</h3><div class="subSection"><ol><li><p>The client DryadLINQ program determines all of the resources that will be needed in the job. It
checks to see if they are already present on the cluster (using a hash of the binary) and uploads any that
are not present. They are uploaded to the default cluster storage account, so that Hadoop 2.2 services like
YARN will be able to read them using wasb. (See <a class="mtps-external-link" title="Optional alternate text" href="http://azure.microsoft.com/en-us/documentation/articles/hdinsight-use-blob-storage/" target="_blank">Using Azure Blob storage with HDInsight</a> for an explanation of how wasb/hdfs interacts with Azure blob storage.)</p></li><li><p>The client serializes a description of the DryadLINQ YARN application into an XML file. This file contains
a list of the resources that the DryadLINQ Application Master needs in order to run, and a command line for the
application master. (See <a class="mtps-external-link" href="http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/" target="_blank">YARN concepts</a> for an explanation of application masters.) This XML file is uploaded to the cluster's
default container as <em>user/&lt;yourUserName&gt;/staging/&lt;jobGuid&gt;.xml.&lt;hash&gt;</em>.</p></li><li><p>The client calls the .NET Hadoop SDK to run a Hadoop Streaming job using the above XML file as input.</p></li><li><p>The .NET SDK calls the Templeton REST API on your cluster.</p></li><li><p>The Templeton REST server launches a MapReduce job called <span class="command">TempletonControllerJob</span> on
your cluster.</p></li><li><p>The controller job launches a second MapReduce job called <span class="command">streamjob&lt;someNumber&gt;.jar</span>
on your cluster.</p></li><li><p>The streaming job reads the XML serialized above, and launches the DryadLINQ YARN application master, which
then actually runs your program. The title of the DryadLINQ application is <span class="command">DryadLINQ.App</span> by
default, but you can set it to something more friendly using the <span class="code">JobFriendlyName</span> property
of the <span class="code">DryadLinqContext</span>.</p></li><li><p>The streaming job writes the YARN application Id for the DryadLINQ application back to the cluster's default
container as <em>user/&lt;yourUserName&gt;/staging/&lt;jobGuid&gt;/part.00000</em>.</p></li><li><p>The DryadLINQ application writes heartbeat, logging and status information into a container called
<em>dryad-jobs/&lt;yarn-application-id&gt;</em> in the cluster's default storage account.</p></li><li><p>The client code reads the application id from <em>user/&lt;yourUserName&gt;/staging/&lt;jobGuid&gt;/part.00000</em>
and then monitors <em>dryad-jobs/&lt;yarn-application-id&gt;</em> to get updates on the progress of the job.
This is also where the job browser gets its information about the job.</p></li></ol><p>If you <a class="mtps-external-link" href="http://azure.microsoft.com/en-us/documentation/articles/hdinsight-administer-use-management-portal/" target="_blank">Enable Remote Desktop on your HDInsight cluster</a>, and click on the <span class="command">Hadoop YARN Status</span> shortcut link on the desktop, you can see all these
jobs running.</p><p>Unfortunately because of the current configuration of HDInsight clusters, all DryadLINQ logs are deleted immediately
when the application exits, and you will get a "Failed redirect for container" error if you try to navigate to the logs of
a completed application. We have tried to report errors in user application code back so that they are visible in the
<a href="91822db3-8a00-4307-ad8a-595c94f449b0.htm" target="">DryadLINQ Job Browser</a> to avoid the need to consult
the logs.</p></div></div></div></div></div><div id="OH_footer" class="OH_footer"><p /><div class="OH_feedbacklink"><a href="mailto:?subject=DryadLINQ+documentation+Running+a+DryadLINQ+job+on+HDInsight+100+EN-US&amp;body=Your%20feedback%20is%20used%20to%20improve%20the%20documentation%20and%20the%20product.%20Your%20e-mail%20address%20will%20not%20be%20used%20for%20any%20other%20purpose%20and%20is%20disposed%20of%20after%20the%20issue%20you%20report%20is%20resolved.%20While%20working%20to%20resolve%20the%20issue%20that%20you%20report%2c%20you%20may%20be%20contacted%20via%20e-mail%20to%20get%20further%20details%20or%20clarification%20on%20the%20feedback%20you%20sent.%20After%20the%20issue%20you%20report%20has%20been%20addressed%2c%20you%20may%20receive%20an%20e-mail%20to%20let%20you%20know%20that%20your%20feedback%20has%20been%20addressed.">Send Feedback</a> on this topic.</div></div></body></html>