Details
Description
In 'Running a Job Using RecordService' section of installOnCluster page, there are some commands which should not be in one row and require wrap:
Log in to one of the nodes in your cluster, and load test data:
wget -q --no-clobber \
https://s3-us-west-1.amazonaws.com/recordservice-vm/tpch.tar.gz
tar -xzf tpch.tar.gz
hadoop fs -mkdir -p /test-warehouse/tpch.nation
hadoop fs -put -f tpch/nation/* /test-warehouse/tpch.nation/
impala-shell -f create-tbls.sql
Run a MapReduce job for RecordCount on tpch.nation:
hadoop jar /path/to/recordservice-examples-0.1.jar \ com.cloudera.recordservice.examples.mapreduce.RecordCount \
"SELECT * FROM tpch.nation" "/tmp/recordcount_output"
Start spark-shell with the RecordService JAR:
path/to/spark/bin/spark-shell \
--conf spark.recordservice.planner.hostports=planner_host:planner_port \
--jars /path/to/recordservice-examples-spark-0.1.jar
scala> import com.cloudera.recordservice.spark._
import com.cloudera.recordservice.spark._
scala> val data = sc.recordServiceRecords("select * from tpch.nation")
data: org.apache.spark.rdd.RDD[Array[org.apache.hadoop.io.Writable]] = \ RecordServiceRDD[0] at RDD at RecordServiceRDDBase.scala:57
scala> data.count()
res0: Long = 25