The usage of Oozie Coordinator can be categorized in 3 different segments:
Small: consisting of a single coordinator application with embedded dataset definitions
Medium: consisting of a single shared dataset definitions and a few coordinator applications
Large: consisting of a single or multiple shared dataset definitions and several coordinator applications
Systems that fall in the medium and (specially) in the large categories are usually referred as data pipeline systems.
Oozie Coordinator definition XML schemas provide a convenient and flexible mechanism for all 3 systems categorization define above.
For small systems: All dataset definitions and the coordinator application definition can be defined in a single XML file. The XML definition file is commonly in its own HDFS directory.
For medium systems: A single datasets XML file defines all shared/public datasets. Each coordinator application has its own definition file, they may have embedded/private datasets and they may refer, via inclusion, to the shared datasets XML file. All the XML definition files are grouped in a single HDFS directory.
For large systems: Multiple datasets XML file define all shared/public datasets. Each coordinator application has its own definition file, they may have embedded/private datasets and they may refer, via inclusion, to multiple shared datasets XML files. XML definition files are logically grouped in different HDFS directories.
NOTE: Oozie Coordinator does not enforce any specific organization, grouping or naming for datasets and coordinator application definition files.
The fact that each coordinator application is in a separate XML definition file simplifies coordinator job submission, monitoring and managing of jobs. Tools to support groups of jobs can be built on of the basic, per job, commands provided by the Oozie coordinator engine.