32 lines
7.5 KiB
HTML
32 lines
7.5 KiB
HTML
<html xmlns:MSHelp="http://msdn.microsoft.com/mshelp" xmlns:mshelp="http://msdn.microsoft.com/mshelp"><head><link rel="SHORTCUT ICON" href="./../icons/favicon.ico" /><style type="text/css">.OH_CodeSnippetContainerTabLeftActive, .OH_CodeSnippetContainerTabLeft,.OH_CodeSnippetContainerTabLeftDisabled { }.OH_CodeSnippetContainerTabRightActive, .OH_CodeSnippetContainerTabRight,.OH_CodeSnippetContainerTabRightDisabled { }.OH_footer { }</style><link rel="stylesheet" type="text/css" href="./../styles/branding.css" /><link rel="stylesheet" type="text/css" href="./../styles/branding-en-US.css" /><style type="text/css">
|
||
body
|
||
{
|
||
border-left:5px solid #e6e6e6;
|
||
overflow-x:scroll;
|
||
overflow-y:scroll;
|
||
}
|
||
</style><script src="./../scripts/branding.js" type="text/javascript"><!----></script><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>Running a DryadLINQ job on HDInsight</title><meta name="Language" content="en-us" /><meta name="Microsoft.Help.Id" content="3596a79f-0714-43b0-b49a-ea9eeccb7326" /><meta name="Description" content="The process for running a DryadLINQ application on HDInsight 3.0 is a bit complicated. This is because HDInsight does not expose all of the "raw" Hadoop 2.2 protocols to clients outside the cluster." /><meta name="Microsoft.Help.ContentType" content="How To" /><meta name="BrandingAware" content="'true'" /><meta name="SelfBranded" content="true" /></head><body onload="onLoad()" class="primary-mtps-offline-document"><div class="OH_outerDiv"><div class="OH_outerContent"><table class="TitleTable"><tr><td class="OH_tdTitleColumn">Running a DryadLINQ job on HDInsight</td><td class="OH_tdRunningTitleColumn">DryadLINQ documentation</td></tr></table><div id="mainSection"><div id="mainBody"><span class="introStyle"></span><div class="introduction"><p>The process for running a DryadLINQ application on HDInsight 3.0 is a bit complicated. This is because
|
||
HDInsight does not expose all of the "raw" Hadoop 2.2 protocols to clients outside the cluster. In particular,
|
||
the only way to launch a job on a cluster is using the <a class="mtps-external-link" href="http://people.apache.org/~thejas/templeton_doc_latest/index.html" target="_blank">Templeton</a> REST APIs, as nicely wrapped up in the <a class="mtps-external-link" title="Optional alternate text" href="http://hadoopsdk.codeplex.com/" target="_blank">Microsoft .NET SDK for Hadoop</a>. Unfortunately, right now Templeton does not support native YARN applications like DryadLINQ, and so
|
||
the only jobs that may be launched from outside the cluster are Hadoop 1 jobs (MapReduce, HIVE, Pig, and so on).
|
||
</p></div><h3 class="procedureSubHeading">What happens when your client program runs a job</h3><div class="subSection"><ol><li><p>The client DryadLINQ program determines all of the resources that will be needed in the job. It
|
||
checks to see if they are already present on the cluster (using a hash of the binary) and uploads any that
|
||
are not present. They are uploaded to the default cluster storage account, so that Hadoop 2.2 services like
|
||
YARN will be able to read them using wasb. (See <a class="mtps-external-link" title="Optional alternate text" href="http://azure.microsoft.com/en-us/documentation/articles/hdinsight-use-blob-storage/" target="_blank">Using Azure Blob storage with HDInsight</a> for an explanation of how wasb/hdfs interacts with Azure blob storage.)</p></li><li><p>The client serializes a description of the DryadLINQ YARN application into an XML file. This file contains
|
||
a list of the resources that the DryadLINQ Application Master needs in order to run, and a command line for the
|
||
application master. (See <a class="mtps-external-link" href="http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/" target="_blank">YARN concepts</a> for an explanation of application masters.) This XML file is uploaded to the cluster's
|
||
default container as <em>user/<yourUserName>/staging/<jobGuid>.xml.<hash></em>.</p></li><li><p>The client calls the .NET Hadoop SDK to run a Hadoop Streaming job using the above XML file as input.</p></li><li><p>The .NET SDK calls the Templeton REST API on your cluster.</p></li><li><p>The Templeton REST server launches a MapReduce job called <span class="command">TempletonControllerJob</span> on
|
||
your cluster.</p></li><li><p>The controller job launches a second MapReduce job called <span class="command">streamjob<someNumber>.jar</span>
|
||
on your cluster.</p></li><li><p>The streaming job reads the XML serialized above, and launches the DryadLINQ YARN application master, which
|
||
then actually runs your program. The title of the DryadLINQ application is <span class="command">DryadLINQ.App</span> by
|
||
default, but you can set it to something more friendly using the <span class="code">JobFriendlyName</span> property
|
||
of the <span class="code">DryadLinqContext</span>.</p></li><li><p>The streaming job writes the YARN application Id for the DryadLINQ application back to the cluster's default
|
||
container as <em>user/<yourUserName>/staging/<jobGuid>/part.00000</em>.</p></li><li><p>The DryadLINQ application writes heartbeat, logging and status information into a container called
|
||
<em>dryad-jobs/<yarn-application-id></em> in the cluster's default storage account.</p></li><li><p>The client code reads the application id from <em>user/<yourUserName>/staging/<jobGuid>/part.00000</em>
|
||
and then monitors <em>dryad-jobs/<yarn-application-id></em> to get updates on the progress of the job.
|
||
This is also where the job browser gets its information about the job.</p></li></ol><p>If you <a class="mtps-external-link" href="http://azure.microsoft.com/en-us/documentation/articles/hdinsight-administer-use-management-portal/" target="_blank">Enable Remote Desktop on your HDInsight cluster</a>, and click on the <span class="command">Hadoop YARN Status</span> shortcut link on the desktop, you can see all these
|
||
jobs running.</p><p>Unfortunately because of the current configuration of HDInsight clusters, all DryadLINQ logs are deleted immediately
|
||
when the application exits, and you will get a "Failed redirect for container" error if you try to navigate to the logs of
|
||
a completed application. We have tried to report errors in user application code back so that they are visible in the
|
||
<a href="91822db3-8a00-4307-ad8a-595c94f449b0.htm" target="">DryadLINQ Job Browser</a> to avoid the need to consult
|
||
the logs.</p></div></div></div></div></div><div id="OH_footer" class="OH_footer"><p /><div class="OH_feedbacklink"><a href="mailto:?subject=DryadLINQ+documentation+Running+a+DryadLINQ+job+on+HDInsight+100+EN-US&body=Your%20feedback%20is%20used%20to%20improve%20the%20documentation%20and%20the%20product.%20Your%20e-mail%20address%20will%20not%20be%20used%20for%20any%20other%20purpose%20and%20is%20disposed%20of%20after%20the%20issue%20you%20report%20is%20resolved.%20While%20working%20to%20resolve%20the%20issue%20that%20you%20report%2c%20you%20may%20be%20contacted%20via%20e-mail%20to%20get%20further%20details%20or%20clarification%20on%20the%20feedback%20you%20sent.%20After%20the%20issue%20you%20report%20has%20been%20addressed%2c%20you%20may%20receive%20an%20e-mail%20to%20let%20you%20know%20that%20your%20feedback%20has%20been%20addressed.">Send Feedback</a> on this topic.</div></div></body></html> |