Resolution: Not A Bug
Affects Version/s: 3.9.0
Fix Version/s: None
10 nodes cluster of
Cloudera Enterprise 5.5.1 (#8 built by jenkins on 20151201-1822 git: 2a7dfe22d921bef89c7ee3c2981cb4c1dc43de7b)
Each with 16GB memory.
Spark App Workflow created on Hue cannot save to HDFS through DataFrame API.
The issue was found when trying to save NaiveBayesModel through MLlib API.
OutOfMemoryError was obtained from the log, and then after Hue refresh the log page,
the log can no longer be found.
It was found that the model saving function contains code to use "write.parquet" of the DataFrame API.
Three scripts were then created to test the DataFrame save functions using "write.save", "write.json" and "write.parquet". The scripts were built using Scale IDE installed in cloudera 5.5.0 quick-start vm using scala 2.10.6 and the cloudera Java 7 JDK in the vm.
The apps run properly when triggered in console using spark-submit, but OutOfMemoryError were obtained when the workflows were created on Hue (both in the quick start vm and the real cluster).
It was then tested to run the app with larger executor memory, and was found that only delay the occurrence of the OutOfMemoryError (There were more heartbeat in the stdout log)