Kite SDK |
![]() |
|
High-level ToolsKite's API and tools are built around datasets. Datasets are identified by unique URIs, such as dataset:hive:ratings. Dataset is a consistent interface for working with your Hadoop data. You have control of implementation details, such as whether to use Avro or Parquet format, HDFS or HBase storage, but you only have to tell Kite what to do; Kite handles the implementation for you. kite-dataset csv-import ratings.csv\ dataset:hbase:zk/ratings Added 1000000 records to dataset "ratings" Kite's command-line interface helps you manage datasets with pre-built tasks like creating datasets, migrating schemas, and loading data. It also helps you configure Kite and other Hadoop projects. Get started with the Kite's CSV tutorial. The Kite Data API provides programmatic access to datasets. Using the API, you can build applications that directly interact with Kite Datasets. View latest = Datasets.load(uri) .from("time",startOfToday).to("time",now); |
Low-level ControlWhen you create a dataset, you control your data layout, record schema, and other options with straightforward configuration. Then you can focus on building your application, while Kite handles data storage for you. Kite automatically partitions records when writing, and prunes partitions when reading. time,rating,user_id,item_id 1412361369702,4,34,18865 ... [ {"type": "year", "source": "time"}, {"type": "month", "source": "time"}, {"type": "day", "source": "time"}, ] datasets/ |---ratings/ |---year=2014/ |---month=09/ | |--- day=01/ | |--- ... | |--- day=30/ |---month=10/ | |--- day=01/ | |--- ... |
Configuration-based TransformationKite Morphlines is a flexible way to express data transformations as configuration. |