I'd like to add namespaces to the API. There are a couple of arguments for this:
1. Users like to be able to have simple names, which tend to overlap one another. Our examples use "users" and "events", and we don't necessarily want to force people into opening a different repository to effectively get namespacing.
2. Other databases that we interact with already do this and future DBs will as well. We currently put all of the Hive Datasets in the same DB/namespace, "default". What's worse is that future implementations may try to be clever and hack namespaces into the API by interpreting "ns.tab" to be "ns" and "tab", but this is complicated to get right.
This will involve:
- Changing all DatasetRepository and MetadataProvider method signatures, changing (String name, ...) to (String namespace, String name, ...)
- This should rely on
CDK-139, so changes only need to be made to the MetadataProvider.
- For compatibility in the MetadataProvider, we have a few options:
- Add a flag set by the builder old repositories
- When the namespace is null, drop the namespace part of the path
- Do nothing; Users will need to change root from "/path/to/data" to "/path/to", and use the "data" namespace.
- This will be a good time to test the DatasetRepository API more thoroughly.
I'm a fan of the last option for compatibility because CDK is < 1.0.