Apache Pig-Grunt Shell

Gruntシェルを呼び出した後、シェルでPigスクリプトを実行できます。それに加えて、Gruntシェルによって提供される特定の有用なシェルおよびユーティリティコマンドがあります。この章では、Gruntシェルが提供するシェルおよびユーティリティコマンドについて説明します。

注-この章の一部では、 Load や Store などのコマンドが使用されます。詳細については、それぞれの章を参照してください。

シェルコマンド

Apache PigのGruntシェルは、主にPig Latinスクリプトを作成するために使用されます。それ以前は、 sh および fs を使用して、任意のシェルコマンドを呼び出すことができます。

shコマンド

*sh* コマンドを使用して、Gruntシェルからシェルコマンドを呼び出すことができます。 Gruntシェルから *sh* コマンドを使用すると、シェル環境の一部であるコマンドを実行できません（ *ex* -cd）。

構文

以下に、 sh コマンドの構文を示します。

grunt> sh shell command parameters

例

以下に示すように、 sh オプションを使用して、GruntシェルからLinuxシェルの ls コマンドを呼び出すことができます。この例では、 /pig/bin/ ディレクトリ内のファイルをリストします。

grunt> sh ls

pig
pig_1444799121955.log
pig.cmd
pig.py

fsコマンド

*fs* コマンドを使用して、GruntシェルからFsShellコマンドを呼び出すことができます。

構文

以下に fs コマンドの構文を示します。

grunt> sh File System command parameters

例

fsコマンドを使用して、GruntシェルからHDFSのlsコマンドを呼び出すことができます。次の例では、HDFSルートディレクトリ内のファイルを一覧表示します。

grunt> fs –ls

Found 3 items
drwxrwxrwx   - Hadoop supergroup          0 2015-09-08 14:13 Hbase
drwxr-xr-x   - Hadoop supergroup          0 2015-09-09 14:52 seqgen_data
drwxr-xr-x   - Hadoop supergroup          0 2015-09-08 11:30 twitter_data

同様に、 fs コマンドを使用して、Gruntシェルから他のすべてのファイルシステムシェルコマンドを呼び出すことができます。

ユーティリティコマンド

Gruntシェルは、一連のユーティリティコマンドを提供します。これらには、* clear、help、history、quit、、 *set などのユーティリティコマンドが含まれます。そして、GruntシェルからPigを制御するための* exec、kill、、 *run などのコマンド。以下は、Gruntシェルが提供するユーティリティコマンドの説明です。

クリアコマンド

*clear* コマンドは、Gruntシェルの画面をクリアするために使用されます。

構文

以下に示すように、 clear コマンドを使用して、Grunt Shellの画面をクリアできます。

grunt> clear

helpコマンド

*help* コマンドは、PigコマンドまたはPigプロパティのリストを提供します。

使用法

以下に示すように help コマンドを使用して、Pigコマンドのリストを取得できます。

grunt> help

Commands: <pig latin statement>; - See the PigLatin manual for details:
http://hadoop.apache.org/pig

File system commands:fs <fs arguments> - Equivalent to Hadoop dfs  command:
http://hadoop.apache.org/common/docs/current/hdfs_shelll

Diagnostic Commands:describe <alias>[::<alias] - Show the schema for the alias.
Inner aliases can be described as A::B.
    explain [-script <pigscript>] [-out <path>] [-brief] [-dot|-xml]
       [-param <param_name>=<pCram_value>]
       [-param_file <file_name>] [<alias>] -
       Show the execution plan to compute the alias or for entire script.
       -script - Explain the entire script.
       -out - Store the output into directory rather than print to stdout.
       -brief - Don't expand nested plans (presenting a smaller graph for overview).
       -dot - Generate the output in .dot format. Default is text format.
       -xml - Generate the output in .xml format. Default is text format.
       -param <param_name - See parameter substitution for details.
       -param_file <file_name> - See parameter substitution for details.
       alias - Alias to explain.
       dump <alias> - Compute the alias and writes the results to stdout.

Utility Commands: exec [-param <param_name>=param_value] [-param_file <file_name>] <script> -
       Execute the script with access to grunt environment including aliases.
       -param <param_name - See parameter substitution for details.
       -param_file <file_name> - See parameter substitution for details.
       script - Script to be executed.
    run [-param <param_name>=param_value] [-param_file <file_name>] <script> -
       Execute the script with access to grunt environment.
         -param <param_name - See parameter substitution for details.
       -param_file <file_name> - See parameter substitution for details.
       script - Script to be executed.
    sh  <shell command> - Invoke a shell command.
    kill <job_id> - Kill the hadoop job specified by the hadoop job id.
    set <key> <value> - Provide execution parameters to Pig. Keys and values are case sensitive.
       The following keys are supported:
       default_parallel - Script-level reduce parallelism. Basic input size heuristics used
       by default.
       debug - Set debug on or off. Default is off.
       job.name - Single-quoted name for jobs. Default is PigLatin:<script name>
       job.priority - Priority for jobs. Values: very_low, low, normal, high, very_high.
       Default is normal stream.skippath - String that contains the path.
       This is used by streaming any hadoop property.
    help - Display this message.
    history [-n] - Display the list statements in cache.
       -n Hide line numbers.
    quit - Quit the grunt shell.

historyコマンド

このコマンドは、Grunt sellが呼び出されてから現在までに実行/使用されたステートメントのリストを表示します。

使用法

Gruntシェルを開いてから3つのステートメントを実行したと仮定します。

grunt> customers = LOAD 'hdfs://localhost:9000/pig_data/customers.txt' USING PigStorage(',');

grunt> orders = LOAD 'hdfs://localhost:9000/pig_data/orders.txt' USING PigStorage(',');

grunt> student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING PigStorage(',');

次に、 history コマンドを使用すると、次の出力が生成されます。

grunt> history

customers = LOAD 'hdfs://localhost:9000/pig_data/customers.txt' USING PigStorage(',');

orders = LOAD 'hdfs://localhost:9000/pig_data/orders.txt' USING PigStorage(',');

student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING PigStorage(',');

setコマンド

*set* コマンドは、Pigで使用されるキーに値を表示/割り当てるために使用されます。

使用法

このコマンドを使用すると、次のキーに値を設定できます。

Key	Description and values
default_parallel	You can set the number of reducers for a map job by passing any whole number as a value to this key.
debug	You can turn off or turn on the debugging freature in Pig by passing on/off to this key.
job.name	You can set the Job name to the required job by passing a string value to this key.
job.priority	You can set the job priority to a job by passing one of the following values to this key − とても低い low 普通の high 非常に高い
stream.skippath	For streaming, you can set the path from where the data is not to be transferred, by passing the desired path in the form of a string to this key.

終了コマンド

このコマンドを使用して、Gruntシェルを終了できます。

使用法

以下に示すように、Gruntシェルを終了します。

grunt> quit

GruntシェルからApache Pigを制御できるコマンドを見てみましょう。

execコマンド

*exec* コマンドを使用して、GruntシェルからPigスクリプトを実行できます。

構文

以下は、ユーティリティコマンド exec の構文です。

grunt> exec [–param param_name = param_value] [–param_file file_name] [script]

例

HDFSの /pig_data/ ディレクトリに student.txt という名前のファイルがあり、次のコンテンツがあると仮定します。

*Student.txt*

001,Rajiv,Hyderabad
002,siddarth,Kolkata
003,Rajesh,Delhi

また、HDFSの /pig_data/ ディレクトリに sample_script.pig という名前のスクリプトファイルがあり、次の内容があるとします。

*Sample_script.pig*

student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING PigStorage(',')
   as (id:int,name:chararray,city:chararray);

Dump student;

次に、以下に示すように exec コマンドを使用して、Gruntシェルから上記のスクリプトを実行します。

grunt> exec/sample_script.pig

出力

*exec* コマンドは、 *sample_script.pig* 内のスクリプトを実行します。 スクリプトで指示されているように、 *student.txt* ファイルをPigにロードし、次の内容を表示するDumpオペレーターの結果を提供します。

(1,Rajiv,Hyderabad)
(2,siddarth,Kolkata)
(3,Rajesh,Delhi)

killコマンド

このコマンドを使用して、Gruntシェルからジョブを強制終了できます。

構文

以下は、 kill コマンドの構文です。

grunt> kill JobId

例

IDが Id_0055 の実行中のPigジョブがあるとします。以下に示すように、 kill コマンドを使用してGruntシェルからそれを強制終了できます。

grunt> kill Id_0055

コマンドを実行

*run* コマンドを使用して、GruntシェルからPigスクリプトを実行できます。

構文

以下に示すのは、 run コマンドの構文です。

grunt> run [–param param_name = param_value] [–param_file file_name] script

例

HDFSの /pig_data/ ディレクトリに student.txt という名前のファイルがあり、次のコンテンツがあると仮定します。

*Student.txt*

001,Rajiv,Hyderabad
002,siddarth,Kolkata
003,Rajesh,Delhi

そして、次の内容のローカルファイルシステムに sample_script.pig という名前のスクリプトファイルがあると仮定します。

*Sample_script.pig*

student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING
   PigStorage(',') as (id:int,name:chararray,city:chararray);

次に、以下に示すようにrunコマンドを使用してGruntシェルから上記のスクリプトを実行します。

grunt> run/sample_script.pig

以下に示すように、 Dump operator を使用して、スクリプトの出力を確認できます。

grunt> Dump;

(1,Rajiv,Hyderabad)
(2,siddarth,Kolkata)
(3,Rajesh,Delhi)

注- exec と run コマンドの違いは、 run を使用すると、スクリプトのステートメントがコマンド履歴で利用できることです。

Apache-pig-grunt-shell

Apache Pig-Grunt Shell

シェルコマンド

shコマンド

fsコマンド

ユーティリティコマンド

クリアコマンド

helpコマンド

historyコマンド

setコマンド

終了コマンド

execコマンド

killコマンド

コマンドを実行

目次