Details
-
Type: Improvement
-
Status: Resolved
-
Priority: Major
-
Resolution: Fixed
-
Affects Version/s: 0.8.1
-
Fix Version/s: 0.9.0
-
Component/s: Data Module
-
Labels:None
Description
FileSystemDatasetWriter#flush does not call hflush on the underlying HDFS output stream. We should fix this so that flushed entries are visible to new readers.
We might also add a sync method to DatasetWriter that calls hsync on the stream to guarantee that the entities have been written to disk.
ParquetFileSystemDatasetWriter does not support flushing or syncing. The HBase implementation does. We should document the support for all types of DatasetWriter.