Details
-
Type:
Bug
-
Status: Resolved
-
Priority:
Major
-
Resolution: Not A Bug
-
Affects Version/s: search-1.0.0
-
Fix Version/s: None
-
Component/s: Search
-
Labels:None
-
Environment:CDH5.1.2 + search-1.0.0 + solr-4.4 +Intel Enterprise Edition for Lustre*, version 2.2
Description
Hi,
I am testing Cloudera components with Lustre file system, following the instructions of Cloudera certification.
For now, Lustre can work with CDH, HBase, Hive, Pig, Mahout and Spark.
Recently, I encounter the following issues when I perform Cloudera Search testing. That makes one question come to me again "does Cloudera Search only support HDFS?" just like Impala(please see https://issues.cloudera.org/browse/IMPALA-1404).
My reference for Cloudera Search is "Cloudera Search User Guide" in http://www.cloudera.com/content/cloudera/en/documentation/cloudera-search/v1-latest/PDF/Cloudera-Search-User-Guide.pdf
Issue 1:
It happened when I created my first solr collection by running the following command
$ solrctl collection --create collection1 -s 1 Error: A call to SolrCloud WEB APIs failed: HTTP/1.1 200 OK Server: Apache-Coyote/1.1 Content-Type: application/xml;charset=UTF-8 Transfer-Encoding: chunked Date: Wed, 29 Oct 2014 06:27:51 GMT <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status"> 0</int> <int name="QTime"> 4797</int> </lst> <lst name="failure"> <str> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'collection1_shard1_replica1': Unable to create core: collection1_shard1_replica1 Caused by: /lustre:/solr/collection1/core_node1/data/tlog</str> </lst>
The directory is right and can be accessed by solr, and I didn't find anything wrong with Lustre log.
Then, after checking solrconfig.xml, I disabled updateLog feature. It really did work. But later, I saw solr doc say that Realtime-get currently relies on the update log feature. So, is disabling updateLog feature a right fix for this problem? Why tlog dir can't be created?
Issue2:
I moved on to make batch indexing using mapreduce without updateLog feature. This time I hit the following issue
hadoop --config /etc/hadoop/conf jar /usr/lib/solr/contrib/mr/search-mr-*-job.jar org.apache.solr.hadoop.MapReduceIndexerTool -D 'mapred.child.java.opts=-Xmx500m' --log4j /usr/share/doc/search*/examples/solr-nrt/log4j.properties --morphline-file /usr/share/doc/search*/examples/solr-nrt/test-morphlines/tutorialReadAvroContainer.conf --output-dir lustre:///user/solr/outdir --verbose --go-live --zk-host centos6-hadoop:2181/solr --collection collection3 lustre:/user/solr/indir ... 1038 [main] INFO org.apache.solr.hadoop.MapReduceIndexerTool - Indexing 2 files using 2 real mappers into 2 reducers Error: org.kitesdk.morphline.api.MorphlineRuntimeException: java.lang.IllegalArgumentException: Host must not be null: lustre:/user/solr/indir/sample-statuses-20120906-141433.avro at org.kitesdk.morphline.base.FaultTolerance.handleException(FaultTolerance.java:73) at org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:213) at org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:86) at org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:54) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.lang.IllegalArgumentException: Host must not be null: lustre:/user/solr/indir/sample-statuses-20120906-141433.avro at org.apache.solr.hadoop.PathParts.<init>(PathParts.java:61) at org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:185) ... 10 more
This looks like some path resolution problem in Solr. Since Lustre is a type of parallel distributed file system, when we use it instead of HDFS, we don't need hostname (lustre:///path, not like hdfs://hostname:port/path).
So, if Cloudera Search run on Lustre, how to fix the above issues?
Thanks ahead!