[RS-70] update cmd in 'Running a Job Using RecordService' section - Cloudera Open Source

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.2.0
Fix Version/s: 0.2.0
Component/s: Doc
Labels:
None

Description

In 'Running a Job Using RecordService' section of installOnCluster page, there are some commands which should not be in one row and require wrap:

Log in to one of the nodes in your cluster, and load test data:
wget -q --no-clobber \
https://s3-us-west-1.amazonaws.com/recordservice-vm/tpch.tar.gz

tar -xzf tpch.tar.gz
hadoop fs -mkdir -p /test-warehouse/tpch.nation
hadoop fs -put -f tpch/nation/* /test-warehouse/tpch.nation/
impala-shell -f create-tbls.sql

Run a MapReduce job for RecordCount on tpch.nation:
hadoop jar /path/to/recordservice-examples-0.1.jar \ com.cloudera.recordservice.examples.mapreduce.RecordCount \
"SELECT * FROM tpch.nation" "/tmp/recordcount_output"

Start spark-shell with the RecordService JAR:
path/to/spark/bin/spark-shell \
--conf spark.recordservice.planner.hostports=planner_host:planner_port \
--jars /path/to/recordservice-examples-spark-0.1.jar

scala> import com.cloudera.recordservice.spark._
import com.cloudera.recordservice.spark._

scala> val data = sc.recordServiceRecords("select * from tpch.nation")
data: org.apache.spark.rdd.RDD[Array[org.apache.hadoop.io.Writable]] = \ RecordServiceRDD[0] at RDD at RecordServiceRDDBase.scala:57

scala> data.count()
res0: Long = 25

Attachments

Activity

People

Assignee:

Dennis Dawson

Reporter:

Li Li

Votes:

0 Vote for this issue

Watchers:

2 Start watching this issue

Dates

Created:

09/Dec/15 11:50 PM

Updated:

11/Dec/15 12:52 AM

Resolved:

11/Dec/15 12:52 AM