Uploaded image for project: 'CDH (READ-ONLY)'
  1. CDH (READ-ONLY)
  2. DISTRO-659

Does Cloudera Search support other file systems other than HDFS?

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Not A Bug
    • Affects Version/s: search-1.0.0
    • Fix Version/s: None
    • Component/s: Search
    • Labels:
      None
    • Environment:
      CDH5.1.2 + search-1.0.0 + solr-4.4 +Intel Enterprise Edition for Lustre*, version 2.2

      Description

      Hi,
      I am testing Cloudera components with Lustre file system, following the instructions of Cloudera certification.
      For now, Lustre can work with CDH, HBase, Hive, Pig, Mahout and Spark.

      Recently, I encounter the following issues when I perform Cloudera Search testing. That makes one question come to me again "does Cloudera Search only support HDFS?" just like Impala(please see https://issues.cloudera.org/browse/IMPALA-1404).

      My reference for Cloudera Search is "Cloudera Search User Guide" in http://www.cloudera.com/content/cloudera/en/documentation/cloudera-search/v1-latest/PDF/Cloudera-Search-User-Guide.pdf

      Issue 1:
      It happened when I created my first solr collection by running the following command

      $ solrctl collection --create collection1 -s 1
      Error: A call to SolrCloud WEB APIs failed: HTTP/1.1 200 OK
      Server: Apache-Coyote/1.1
      Content-Type: application/xml;charset=UTF-8
      Transfer-Encoding: chunked
      Date: Wed, 29 Oct 2014 06:27:51 GMT
      
      <?xml version="1.0" encoding="UTF-8"?>
      
      <response>
      
      <lst name="responseHeader">
      <int name="status">
      0</int>
      <int name="QTime">
      4797</int>
      </lst>
      <lst name="failure">
      <str>
      org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'collection1_shard1_replica1': Unable to create core: collection1_shard1_replica1 Caused by: /lustre:/solr/collection1/core_node1/data/tlog</str>
      </lst>
      

      The directory is right and can be accessed by solr, and I didn't find anything wrong with Lustre log.
      Then, after checking solrconfig.xml, I disabled updateLog feature. It really did work. But later, I saw solr doc say that Realtime-get currently relies on the update log feature. So, is disabling updateLog feature a right fix for this problem? Why tlog dir can't be created?

      Issue2:
      I moved on to make batch indexing using mapreduce without updateLog feature. This time I hit the following issue

      hadoop --config /etc/hadoop/conf jar /usr/lib/solr/contrib/mr/search-mr-*-job.jar org.apache.solr.hadoop.MapReduceIndexerTool -D 'mapred.child.java.opts=-Xmx500m' --log4j /usr/share/doc/search*/examples/solr-nrt/log4j.properties --morphline-file /usr/share/doc/search*/examples/solr-nrt/test-morphlines/tutorialReadAvroContainer.conf --output-dir lustre:///user/solr/outdir --verbose --go-live --zk-host centos6-hadoop:2181/solr --collection collection3 lustre:/user/solr/indir
      ...
      1038 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  - Indexing 2 files using 2 real mappers into 2 reducers
      Error: org.kitesdk.morphline.api.MorphlineRuntimeException: java.lang.IllegalArgumentException: Host must not be null: lustre:/user/solr/indir/sample-statuses-20120906-141433.avro
      	at org.kitesdk.morphline.base.FaultTolerance.handleException(FaultTolerance.java:73)
      	at org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:213)
      	at org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:86)
      	at org.apache.solr.hadoop.morphline.MorphlineMapper.map(MorphlineMapper.java:54)
      	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
      	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
      	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
      Caused by: java.lang.IllegalArgumentException: Host must not be null: lustre:/user/solr/indir/sample-statuses-20120906-141433.avro
      	at org.apache.solr.hadoop.PathParts.<init>(PathParts.java:61)
      	at org.apache.solr.hadoop.morphline.MorphlineMapRunner.map(MorphlineMapRunner.java:185)
      	... 10 more
      

      This looks like some path resolution problem in Solr. Since Lustre is a type of parallel distributed file system, when we use it instead of HDFS, we don't need hostname (lustre:///path, not like hdfs://hostname:port/path).

      So, if Cloudera Search run on Lustre, how to fix the above issues?

      Thanks ahead!

        Attachments

          Activity

            People

            • Assignee:
              whoschek Wolfgang Hoschek
              Reporter:
              emoly.liu liu ying
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: