-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add TagsSet util class in the base package #20986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
AAraKKe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating this! I added a some comments but I would say the most important one is to agree on the api. I feel that having different entry points can cause the intention of this change, avoiding leaking repeated tags when they should be unique, to be bypassed.
| if key not in self._data: | ||
| self._data[key] = set() | ||
| self._data[key].add(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can simplify this kind of operation with
| if key not in self._data: | |
| self._data[key] = set() | |
| self._data[key].add(value) | |
| self._data.setdefault(key, set()).add(value) |
Or better yet, define self._data as a defaultdict(set) from collections. Every time you try to access a key that does not exist will return an empty set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to use the defaultdict because of the silent KeyError that would just quietly return a default value.
For simplicity, keep it again in 3 line seemed simpler and easier to read as well.
| tags_list: list[tuple[str, str]] = [] | ||
| for key, values in self._data.items(): | ||
| for val in values: | ||
| tags_list.append((key, val)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List comprehension is better for these kind of operations and type is inferred
tags_list = [(key, val) for key, values in self._data.items() for val in values]There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think list comprehensions in such scenarios could can reduce the readability of the code, not worth saving 3 lines of code in my opinion.
| def get_standalone_tags(self) -> set[str]: | ||
| return self.get_tag('') | ||
|
|
||
| def _get_tag_tuples(self, sort: bool = True) -> list[tuple[str, str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are giving the user the option to sort, we should give them the option to sort ascending or descending as well. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was waiting to check what we do with the tags arg that we pass to the base checks methods, maybe we already sort them and preprocess them somewhere.
| self._data.clear() | ||
|
|
||
| def __iter__(self) -> Iterator[tuple[str, str]]: | ||
| """Allow iteration over tags: for tag in ts or list(ts).""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean?
for tag in ts or list(ts)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just a comment to explain how the iter function works, I'll clarify it / remove it.
| from datadog_checks.base.utils.tagging import TagsSet | ||
|
|
||
|
|
||
| class TestTagsSet: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get rid of the class? Testing with classes as namespaces is a left over from the unittest framework from python core but not necessary for pytest which relies on modules for namespacing.
Adding a class adds unnecessary burden to build the class and on reporting in which each line now will include the module the test is in and the name of the class. Pytest supports classes more as a backward compatibility feature to run tests written with the unittest freamework.
Some tests in the repo have classes but we are not using this anymore in newer implementations. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I just kept the same testing style as the codebase in the base check. I'll look into it!
Are there any drawbacks of having mixed styles?
| """ | ||
| if not key: | ||
| raise ValueError("Tag key cannot be empty. Use add_standalone_tag() for standalone values.") | ||
| self._data[key] = {value} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we error here? I feel that not erroring can cause unwanted effects in which users end up overriding tags. Maybe give an option to override that by default is set to false. If override is true then we override instead of erroring.
Another question is, if the intention is to prevent the leaks we have seen in the past. Shouldn't we have a single add method the user interacts with to enforce uniqueness and only allow non-unique tags when the user specifically request it? I feel that having aadd_tag method can cause the user to go stright to that one without realizing there is a more specific add_unique_tag method if they want the tag to be unique. It is more common for a tag to be unique than not.
With a single add_tag method that enforces uniqueness, fi the user wants to add non unique tags, the code will break during testing and they will be force to add an unique flag set to False.
| def test_init_empty(self): | ||
| """Test empty initialization.""" | ||
| tags = TagsSet() | ||
| assert tags.get_tags() == [] | ||
|
|
||
| def test_add_single_tag(self): | ||
| """Test adding a single tag.""" | ||
| tags = TagsSet() | ||
| tags.add_tag('env', 'prod') | ||
| assert tags.get_tags() == ['env:prod'] | ||
|
|
||
| def test_add_multiple_values_same_key(self): | ||
| """Test adding multiple values to same key.""" | ||
| tags = TagsSet() | ||
| tags.add_tag('env', 'prod') | ||
| tags.add_tag('env', 'dev') | ||
| assert tags.get_tags() == ['env:dev', 'env:prod'] | ||
|
|
||
| def test_add_different_keys(self): | ||
| """Test adding tags with different keys.""" | ||
| tags = TagsSet() | ||
| tags.add_tag('env', 'prod') | ||
| tags.add_tag('app', 'web') | ||
| assert tags.get_tags() == ['app:web', 'env:prod'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably parameterize all these tests, they are doing the same thing in different ways.
The same with the tests below that test the same thing with different inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I prefer the simpler way, the duplicated code is minimal, and it’s not worth the added complexity and loss of readability that comes with using pytest.parametrize
|
|
||
| def test_init_empty(self): | ||
| """Test empty initialization.""" | ||
| tags = TagsSet() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a fixture and we do not need to initialize an emtpy TagSet in every test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hot take (don’t hate me): I’m not a fan of fixtures. I just find it way clearer when we start with an empty tag set and build it up, nice and simple: Arrange, Act, Assert. Fixtures do all kinds of stuff behind the scenes, which makes it harder to follow sometimes. I’d use them here if setting up the TagsSet were more complex.
What does this PR do?
Motivation
RFC
Review checklist (to be filled by reviewers)
qa/skip-qalabel if the PR doesn't need to be tested during QA.backport/<branch-name>label to the PR and it will automatically open a backport PR once this one is merged