Kite SDK

Kite is a high-level data layer for Hadoop. Kite is an API and a set of tools that help you speed up development. You configure how Kite stores your data in Hadoop, rather than building and maintaining an infrastructure yourself.

High-level Tools

Kite's API and tools are built around datasets. Datasets are identified by unique URIs, such as dataset:hive:ratings.

Dataset is a consistent interface for working with your Hadoop data. You have control of implementation details, such as whether to use Avro or Parquet format, HDFS or HBase storage, but you only have to tell Kite what to do; Kite handles the implementation for you.

kite-dataset csv-import ratings.csv\
dataset:hbase:zk/ratings
Added 1000000 records to dataset "ratings"

Kite's command-line interface helps you manage datasets with pre-built tasks like creating datasets, migrating schemas, and loading data. It also helps you configure Kite and other Hadoop projects.

Get started with the Kite's CSV tutorial.

The Kite Data API provides programmatic access to datasets. Using the API, you can build applications that directly interact with Kite Datasets.

View latest = Datasets.load(uri)
  .from("time",startOfToday).to("time",now);

Learn more about Kite datasets.

Low-level Control

When you create a dataset, you control your data layout, record schema, and other options with straightforward configuration. Then you can focus on building your application, while Kite handles data storage for you. Kite automatically partitions records when writing, and prunes partitions when reading.

time,rating,user_id,item_id
1412361369702,4,34,18865
...

[
  {"type": "year", "source": "time"},
  {"type": "month", "source": "time"},
  {"type": "day", "source": "time"},
]

datasets/
|---ratings/
    |---year=2014/
        |---month=09/
        |   |--- day=01/
        |   |--- ...
        |   |--- day=30/
        |---month=10/
        |   |--- day=01/
        |   |--- ...

Learn more about configuring with Kite.

Configuration-based Transformation

Kite Morphlines is a flexible way to express data transformations as configuration.

Go to the Morphlines Reference Guide.

Explore the Kite SDK Documentation