public class Datasets
extends java.lang.Object
Dataset
instances.
URIs
All methods require a URI that identifies a dataset, view, or
repository. The URI must begin with the scheme dataset:
,
view:
, or repo:
. The remainder of the URI is
implementation specific, depending on the dataset scheme.
For example, the URI dataset:hive:movies/ratings
references a Hive dataset named ratings in the
movies namespace.
The URI view:hdfs:/user/me/movies/ratings?year=2015&month=3
references a view of an HDFS dataset named ratings in the
movies namespace in the /user/me
path. The view
is filtered to include records from only March, 2015.
Entities
Entities are analagous to records in database terminology. The term is used in the API to emphasize that an entity can include not only primitive objects, but also complex objects such as hash maps.
Constructor and Description |
---|
Datasets() |
Modifier and Type | Method and Description |
---|---|
static <V extends |
create(java.lang.String uri,
DatasetDescriptor descriptor)
Create a
Dataset for the given dataset or view URI string. |
static <E,V extends |
create(java.lang.String uri,
DatasetDescriptor descriptor,
java.lang.Class<E> type)
Create a
Dataset for the given dataset or view URI string. |
static <V extends |
create(java.net.URI uri,
DatasetDescriptor descriptor)
Create a
Dataset for the given dataset or view URI string. |
static <E,V extends |
create(java.net.URI uri,
DatasetDescriptor descriptor,
java.lang.Class<E> type)
Create a
Dataset for the given dataset or view URI string. |
static boolean |
delete(java.lang.String uri)
Delete a
Dataset identified by the given dataset URI string. |
static boolean |
delete(java.net.URI uri)
Delete a
Dataset identified by the given dataset URI. |
static boolean |
exists(java.lang.String uri)
Check whether a
Dataset identified by the given URI string exists. |
static boolean |
exists(java.net.URI uri)
Check whether a
Dataset identified by the given URI exists. |
static java.util.Collection<java.net.URI> |
list(java.lang.String uri)
List the
Dataset URIs in the repository identified by the URI
string. |
static java.util.Collection<java.net.URI> |
list(java.net.URI uri)
List the
Dataset URIs in the repository identified by the URI. |
static <V extends |
load(java.lang.String uriString)
Load a
Dataset or View for the given URI . |
static <E,V extends |
load(java.lang.String uriString,
java.lang.Class<E> type)
Load a
Dataset or View for the given URI . |
static <V extends |
load(java.net.URI uri)
Load a
Dataset or View for the given URI . |
static <E,V extends |
load(java.net.URI uri,
java.lang.Class<E> type)
Load a
Dataset or View for the given URI . |
static <D extends |
update(java.lang.String uri,
DatasetDescriptor descriptor)
Update a
Dataset for the given dataset or view URI string. |
static <E,D extends |
update(java.lang.String uri,
DatasetDescriptor descriptor,
java.lang.Class<E> type)
Update a
Dataset for the given dataset or view URI string. |
static <D extends |
update(java.net.URI uri,
DatasetDescriptor descriptor)
Update a
Dataset for the given dataset or view URI. |
static <E,D extends |
update(java.net.URI uri,
DatasetDescriptor descriptor,
java.lang.Class<E> type)
Update a
Dataset for the given dataset or view URI. |
public static <E,V extends> V load(java.net.URI uri, java.lang.Class<E> type)
Dataset
or View
for the given URI
.
URIs must begin with dataset:
or view:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
If you use a dataset URI, load
returns the unfiltered dataset.
If you use a view URI, load
returns a View
configured to
read a subset of the dataset.
E
- the type used for readers and writers created by this Dataset
V
- the type of View
expecteduri
- a Dataset
or View
URItype
- a Java class that represents an entity in the datasetView
for the given URIDatasetNotFoundException
- if there is no dataset for the given URIjava.lang.NullPointerException
- if any arguments are null
public static <V extends> V load(java.net.URI uri)
Dataset
or View
for the given URI
.
URIs must begin with dataset:
or view:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
If you use a dataset URI, load
returns the unfiltered dataset.
If you use a view URI, load
returns a View
configured to
read a subset of the dataset.
V
- the type of View
expecteduri
- a Dataset
or View
URIView
for the given URIDatasetNotFoundException
- if there is no dataset for the given URIjava.lang.NullPointerException
- if any arguments are null
public static <E,V extends> V load(java.lang.String uriString, java.lang.Class<E> type)
Dataset
or View
for the given URI
.
URIs must begin with dataset:
or view:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
If you use a dataset URI, load
returns the unfiltered dataset.
If you use a view URI, load
returns a View
configured to
read a subset of the dataset.
E
- the type used for readers and writers created by this Dataset
V
- the type of View
expecteduriString
- a Dataset
or View
URItype
- a Java class that represents an entity in the datasetView
for the given URIDatasetNotFoundException
- if there is no dataset for the given URIjava.lang.NullPointerException
- if any arguments are null
public static <V extends> V load(java.lang.String uriString)
Dataset
or View
for the given URI
.
URIs must begin with dataset:
or view:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
If you use a dataset URI, load
returns the unfiltered dataset.
If you use a view URI, load
returns a View
configured to
read a subset of the dataset.
V
- the type of View
expecteduriString
- a Dataset
or View
URIView
for the given URIDatasetNotFoundException
- if there is no dataset for the given URIjava.lang.NullPointerException
- if any arguments are null
public static <E,V extends> V create(java.net.URI uri, DatasetDescriptor descriptor, java.lang.Class<E> type)
Dataset
for the given dataset or view URI string.
create
returns an empty dataset. You can use DatasetWriter
to populate your dataset.
URIs must begin with dataset:
or view:
. The remainder of
the URI is implementation specific, depending on the dataset scheme. If the
URI is a view URI, this method creates the underlying dataset and returns a
view of it.
E
- the type used for readers and writers created by this Dataset
V
- the type of Dataset
or View
expecteduri
- a Dataset
or View
URItype
- a Java class that represents an entity in the datasetDataset
responsible for the given URIjava.lang.NullPointerException
- if URI
or descriptor
is null
DatasetExistsException
- if a Dataset
for the given URI
already existsIncompatibleSchemaException
- if the schema is not
compatible with existing
datasets with shared
storage (for example, in the
same HBase table)public static <V extends> V create(java.net.URI uri, DatasetDescriptor descriptor)
Dataset
for the given dataset or view URI string.
create
returns an empty dataset. You can use DatasetWriter
to populate your dataset.
URIs must begin with dataset:
or view:
. The remainder of
the URI is implementation specific, depending on the dataset scheme. If the
URI is a view URI, this method creates the underlying dataset and returns a
view of it.
V
- the type of Dataset
or View
expecteduri
- a Dataset
or View
URIDataset
responsible for the given URIjava.lang.NullPointerException
- if URI
or descriptor
is null
DatasetExistsException
- if a Dataset
for the given URI
already existsIncompatibleSchemaException
- if the schema is not
compatible with existing
datasets with shared
storage (for example, in the
same HBase table)public static <E,V extends> V create(java.lang.String uri, DatasetDescriptor descriptor, java.lang.Class<E> type)
Dataset
for the given dataset or view URI string.
create
returns an empty dataset. You can use DatasetWriter
to populate your dataset.
URIs must begin with dataset:
or view:
. The remainder of
the URI is implementation specific, depending on the dataset scheme. If the
URI is a view URI, this method creates the underlying dataset and returns a
view of it.
E
- the type used for readers and writers created by this Dataset
V
- the type of Dataset
or View
expecteduri
- a Dataset
or View
URI stringtype
- a Java class that represents an entity in the datasetDataset
responsible for the given URIjava.lang.NullPointerException
- if URI
or descriptor
is null
DatasetExistsException
- if a Dataset
for the given URI
already existsIncompatibleSchemaException
- if the schema is not
compatible with existing
datasets with shared
storage (for example, in the
same HBase table)public static <V extends> V create(java.lang.String uri, DatasetDescriptor descriptor)
Dataset
for the given dataset or view URI string.
create
returns an empty dataset. You can use DatasetWriter
to populate your dataset.
URIs must begin with dataset:
or view:
. The remainder of
the URI is implementation specific, depending on the dataset scheme. If the
URI is a view URI, this method creates the underlying dataset and returns a
view of it.
V
- the type of Dataset
or View
expecteduri
- a Dataset
or View
URI stringDataset
responsible for the given URIjava.lang.NullPointerException
- if URI
or descriptor
is null
DatasetExistsException
- if a Dataset
for the given URI
already existsIncompatibleSchemaException
- if the schema is not
compatible with existing
datasets with shared
storage (for example, in the
same HBase table)public static <E,D extends> D update(java.net.URI uri, DatasetDescriptor descriptor, java.lang.Class<E> type)
Dataset
for the given dataset or view URI.
You can add columns, remove columns, or change the data type of columns in your dataset, provided you don't attempt a change that is incompatible with written data. Avro defines rules for compatible schema evolution. See Schema Evolution.
This method updates the dataset descriptor, so you can also add or change properties.
The recommended way to update a dataset descriptor is to build it
based on an existing descriptor. Use
DatasetDescriptor.Builder(DatasetDescriptor descriptor)
to
build a DatasetDescriptor based on an existing instance.
You cannot change a dataset format or partition strategy.
URIs must begin with dataset:
or view:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
E
- the type used for readers and writers created by this
Dataset
D
- the type of Dataset
expecteduri
- a Dataset
or View
URItype
- a Java class that represents an entity in the datasetDataset
for the given URI; does not return a view,
even when you pass a view URIjava.lang.NullPointerException
- if URI
is nullDatasetNotFoundException
- if there is no dataset for the given
URIjava.lang.UnsupportedOperationException
- if descriptor updates are not
supported by the implementationConcurrentSchemaModificationException
- if the Dataset
schema is updated
concurrentlyIncompatibleSchemaException
- if the schema is not
compatible with
previous schemas,
or with existing
datasets with shared
storage (for example, in the
same HBase table)public static <D extends> D update(java.net.URI uri, DatasetDescriptor descriptor)
Dataset
for the given dataset or view URI.
You can add columns, remove columns, or change the data type of columns in your dataset, provided you don't attempt a change that is incompatible with written data. Avro defines rules for compatible schema evolution. See Schema Evolution.
This method updates the dataset descriptor, so you can also add or change properties.
The recommended way to update a dataset descriptor is to build it
based on an existing descriptor. Use
DatasetDescriptor.Builder(DatasetDescriptor descriptor)
to
build a DatasetDescriptor based on an existing instance.
You cannot change a dataset format or partition strategy.
URIs must begin with dataset:
or view:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
D
- the type of Dataset
expecteduri
- a Dataset
or View
URIDataset
for the given URI; does not return a view,
even when you pass a view URIjava.lang.NullPointerException
- if URI
is nullDatasetNotFoundException
- if there is no dataset for the given
URIjava.lang.UnsupportedOperationException
- if descriptor updates are not
supported by the implementationConcurrentSchemaModificationException
- if the Dataset
schema is updated
concurrentlyIncompatibleSchemaException
- if the schema is not
compatible with
previous schemas,
or with existing
datasets with shared
storage (for example, in the
same HBase table)public static <E,D extends> D update(java.lang.String uri, DatasetDescriptor descriptor, java.lang.Class<E> type)
Dataset
for the given dataset or view URI string.
You can add columns, remove columns, or change the data type of columns in your dataset, provided you don't attempt a change that is incompatible with written data. Avro defines rules for compatible schema evolution. See Schema Evolution.
This method updates the dataset descriptor, so you can also add or change properties.
The recommended way to update a dataset descriptor is to build it
based on an existing descriptor. Use
DatasetDescriptor.Builder(DatasetDescriptor descriptor)
to
build a DatasetDescriptor based on an existing instance.
You cannot change a dataset format or partition strategy.
URIs must begin with dataset:
or view:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
E
- the type used for readers and writers created by this
Dataset
D
- the type of Dataset
expecteduri
- a Dataset
or View
URI stringtype
- a Java class that represents an entity in the datasetDataset
for the given URI; does not return a view,
even when you pass a view URIjava.lang.NullPointerException
- if URI
is nullDatasetNotFoundException
- if there is no dataset for the given
URIjava.lang.UnsupportedOperationException
- if descriptor updates are not
supported by the implementationConcurrentSchemaModificationException
- if the Dataset
schema is updated
concurrentlyIncompatibleSchemaException
- if the schema is not
compatible with
previous schemas,
or with existing
datasets with shared
storage (for example, in the
same HBase table)public static <D extends> D update(java.lang.String uri, DatasetDescriptor descriptor)
Dataset
for the given dataset or view URI string.
You can add columns, remove columns, or change the data type of columns in your dataset, provided you don't attempt a change that is incompatible with written data. Avro defines rules for compatible schema evolution. See Schema Evolution.
This method updates the dataset descriptor, so you can also add or change properties.
The recommended way to update a dataset descriptor is to build it
based on an existing descriptor. Use
DatasetDescriptor.Builder(DatasetDescriptor descriptor)
to
build a DatasetDescriptor based on an existing instance.
You cannot change a dataset format or partition strategy.
URIs must begin with dataset:
or view:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
D
- the type of Dataset
expecteduri
- a Dataset
or View
URI stringDataset
for the given URI; does not return a view,
even when you pass a view URIjava.lang.NullPointerException
- if URI
is nullDatasetNotFoundException
- if there is no dataset for the given
URIjava.lang.UnsupportedOperationException
- if descriptor updates are not
supported by the implementationConcurrentSchemaModificationException
- if the Dataset
schema is updated
concurrentlyIncompatibleSchemaException
- if the schema is not
compatible with
previous schemas,
or with existing
datasets with shared
storage (for example, in the
same HBase table)public static boolean delete(java.net.URI uri)
Dataset
identified by the given dataset URI.
When you call this method using a dataset URI, both data and metadata are deleted. After you call this method, the dataset no longer exists, unless an exception is thrown.
When you call this method using a view URI, data in that view is deleted.
The dataset's metadata is not changed. This can throw an
UnsupportedOperationException
if the delete requires additional
work. For example, if some, but not all, of the data in an underlying data
file must be removed, then the implementation is allowed to reject the
deletion rather than copy the remaining records to a new file.
An implementation must document under what conditions it accepts deletes,
and under what conditions it rejects them.
URIs must begin with dataset:
or view:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
uri
- a Dataset
URItrue
if any data or metadata is removed, false
otherwisejava.lang.NullPointerException
- if URI
is nullConcurrentSchemaModificationException
- if the Dataset
schema is updated concurrentlypublic static boolean delete(java.lang.String uri)
Dataset
identified by the given dataset URI string.
When you call this method using a dataset URI, both data and metadata are deleted. After you call this method, the dataset no longer exists, unless an exception is thrown.
When you call this method using a view URI, data in that view is deleted.
The dataset's metadata is not changed. This can throw an
UnsupportedOperationException
if the delete requires additional
work. For example, if some, but not all, of the data in an underlying data
file must be removed, then the implementation is allowed to reject the
deletion rather than copy the remaining records to a new file.
An implementation must document under what conditions it accepts deletes,
and under what conditions it rejects them.
URIs must begin with dataset:
or view:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
uri
- a Dataset
URI stringtrue
if any data or metadata is removed, false
otherwisejava.lang.NullPointerException
- if URI
is nullConcurrentSchemaModificationException
- if the Dataset
schema is updated concurrentlypublic static boolean exists(java.net.URI uri)
Dataset
identified by the given URI exists.
URIs must begin with dataset:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
uri
- a Dataset
URItrue
if the dataset exists, false
otherwisejava.lang.NullPointerException
- if URI
is nullpublic static boolean exists(java.lang.String uri)
Dataset
identified by the given URI string exists.
URIs must begin with dataset:
. The remainder of
the URI is implementation specific, depending on the dataset scheme.
uri
- a Dataset
URI stringtrue
if the dataset exists, false
otherwisepublic static java.util.Collection<java.net.URI> list(java.net.URI uri)
Dataset
URIs in the repository identified by the URI.
URI formats are defined by Dataset
implementations. The repository
URIs you pass to this method must begin with repo:
. For example, to
list the Dataset
URIs for the Hive repository, provide the URI
repo:hive
.
uri
- a DatasetRepository
URIDatasetRepository
public static java.util.Collection<java.net.URI> list(java.lang.String uri)
Dataset
URIs in the repository identified by the URI
string.
URI formats are defined by Dataset
implementations. The repository
URIs you pass to this method must begin with repo:
. For example, to
list the Dataset
URIs for the Hive repository, provide the URI
repo:hive
.
uri
- a DatasetRepository
URI stringDatasetRepository