使用spark-submit執行word count

Step 1: 將word.txt上傳至HDFS

$ hadoop fs –put word.txt

Step 2: 下載範例程式

$ git clone https://github.com/ogre0403/Spark-101.git
Initialized empty Git repository in /home/ogre/Spark-101/.git/
remote: Counting objects: 9, done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 9 (delta 0), reused 9 (delta 0), pack-reused 0
Unpacking objects: 100% (9/9), done.

Step 3: 編譯spark程式

$ mvn clean package
…
[INFO] Building jar: /home/ogre/Spark-101/target/spark-sample-0.0.1.jar
[INFO] --------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] --------------------------------------------------------------------
[INFO] Total time: 1.700 s
[INFO] Finished at: 2016-03-30T13:24:08+08:00
[INFO] Final Memory: 39M/1556M
[INFO] -------------------------------------------------------------------

Step 4: 執行

$ spark-submit --master yarn \
    --deploy-mode cluster \
    --class WordCount  \
    spark-sample-0.0.1.jar word.txt output

Step 5: 檢查輸出

$ hadoop fs –ls output
Found 3 items
-rw-r--r-- 3 ogre supergroup  0 2016-03-30 13:31 output/_SUCCESS
-rw-r--r-- 3 ogre supergroup  16 2016-03-30 13:31 output/part-00000
-rw-r--r-- 3 ogre supergroup  16 2016-03-30 13:31 output/part-00001

$ hadoop fs –cat output/part-00000
(bbb,1)
(ddd,2)

results matching ""

    No results matching ""