[HUE-996] [oozie] Support shared Coordinator datasets - Cloudera Open Source

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.2.0
Fix Version/s: None
Component/s: con.oozie
Labels:
None

Target Version:

backlog

Description

The usage of Oozie Coordinator can be categorized in 3 different segments:

Small: consisting of a single coordinator application with embedded dataset definitions
Medium: consisting of a single shared dataset definitions and a few coordinator applications
Large: consisting of a single or multiple shared dataset definitions and several coordinator applications

Systems that fall in the medium and (specially) in the large categories are usually referred as data pipeline systems.

Oozie Coordinator definition XML schemas provide a convenient and flexible mechanism for all 3 systems categorization define above.

For small systems: All dataset definitions and the coordinator application definition can be defined in a single XML file. The XML definition file is commonly in its own HDFS directory.

For medium systems: A single datasets XML file defines all shared/public datasets. Each coordinator application has its own definition file, they may have embedded/private datasets and they may refer, via inclusion, to the shared datasets XML file. All the XML definition files are grouped in a single HDFS directory.

For large systems: Multiple datasets XML file define all shared/public datasets. Each coordinator application has its own definition file, they may have embedded/private datasets and they may refer, via inclusion, to multiple shared datasets XML files. XML definition files are logically grouped in different HDFS directories.

NOTE: Oozie Coordinator does not enforce any specific organization, grouping or naming for datasets and coordinator application definition files.

The fact that each coordinator application is in a separate XML definition file simplifies coordinator job submission, monitoring and managing of jobs. Tools to support groups of jobs can be built on of the basic, per job, commands provided by the Oozie coordinator engine.

<datasets>
<include>hdfs://foo:8020/app/dataset-definitions/datasets.xml</include>
</datasets>

https://groups.google.com/a/cloudera.org/forum/#!topic/hue-user/QCI8NcRRTXE

Attachments

Activity

People

Assignee:

Romain Rigaux

Reporter:

Romain Rigaux

Votes:

1 Vote for this issue

Watchers:

2 Start watching this issue

Dates

Created:

11/Jan/13 10:08 PM

Updated:

26/Feb/21 10:52 PM

Resolved:

26/Feb/21 10:52 PM