Skip to content

Conversation

@NouemanKHAL
Copy link
Member

@NouemanKHAL NouemanKHAL commented Aug 8, 2025

What does this PR do?

Motivation

RFC

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@codecov
Copy link

codecov bot commented Aug 8, 2025

Codecov Report

❌ Patch coverage is 98.23529% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.41%. Comparing base (3c6079e) to head (02c2258).
⚠️ Report is 69 commits behind head on master.

Additional details and impacted files
Flag Coverage Δ
activemq ?
cassandra ?
confluent_platform ?
hive ?
hivemq ?
hudi ?
ignite ?
jboss_wildfly ?
kafka ?
presto ?
solr ?
tomcat ?
weblogic ?

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@AAraKKe AAraKKe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating this! I added a some comments but I would say the most important one is to agree on the api. I feel that having different entry points can cause the intention of this change, avoiding leaking repeated tags when they should be unique, to be bypassed.

Comment on lines +61 to +63
if key not in self._data:
self._data[key] = set()
self._data[key].add(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify this kind of operation with

Suggested change
if key not in self._data:
self._data[key] = set()
self._data[key].add(value)
self._data.setdefault(key, set()).add(value)

Or better yet, define self._data as a defaultdict(set) from collections. Every time you try to access a key that does not exist will return an empty set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to use the defaultdict because of the silent KeyError that would just quietly return a default value.
For simplicity, keep it again in 3 line seemed simpler and easier to read as well.

Comment on lines +93 to +96
tags_list: list[tuple[str, str]] = []
for key, values in self._data.items():
for val in values:
tags_list.append((key, val))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List comprehension is better for these kind of operations and type is inferred

tags_list = [(key, val) for key, values in self._data.items() for val in values]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think list comprehensions in such scenarios could can reduce the readability of the code, not worth saving 3 lines of code in my opinion.

def get_standalone_tags(self) -> set[str]:
return self.get_tag('')

def _get_tag_tuples(self, sort: bool = True) -> list[tuple[str, str]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are giving the user the option to sort, we should give them the option to sort ascending or descending as well. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was waiting to check what we do with the tags arg that we pass to the base checks methods, maybe we already sort them and preprocess them somewhere.

self._data.clear()

def __iter__(self) -> Iterator[tuple[str, str]]:
"""Allow iteration over tags: for tag in ts or list(ts)."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

for tag in ts or list(ts)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just a comment to explain how the iter function works, I'll clarify it / remove it.

from datadog_checks.base.utils.tagging import TagsSet


class TestTagsSet:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of the class? Testing with classes as namespaces is a left over from the unittest framework from python core but not necessary for pytest which relies on modules for namespacing.

Adding a class adds unnecessary burden to build the class and on reporting in which each line now will include the module the test is in and the name of the class. Pytest supports classes more as a backward compatibility feature to run tests written with the unittest freamework.

Some tests in the repo have classes but we are not using this anymore in newer implementations. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I just kept the same testing style as the codebase in the base check. I'll look into it!
Are there any drawbacks of having mixed styles?

"""
if not key:
raise ValueError("Tag key cannot be empty. Use add_standalone_tag() for standalone values.")
self._data[key] = {value}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we error here? I feel that not erroring can cause unwanted effects in which users end up overriding tags. Maybe give an option to override that by default is set to false. If override is true then we override instead of erroring.

Another question is, if the intention is to prevent the leaks we have seen in the past. Shouldn't we have a single add method the user interacts with to enforce uniqueness and only allow non-unique tags when the user specifically request it? I feel that having aadd_tag method can cause the user to go stright to that one without realizing there is a more specific add_unique_tag method if they want the tag to be unique. It is more common for a tag to be unique than not.

With a single add_tag method that enforces uniqueness, fi the user wants to add non unique tags, the code will break during testing and they will be force to add an unique flag set to False.

Comment on lines +13 to +36
def test_init_empty(self):
"""Test empty initialization."""
tags = TagsSet()
assert tags.get_tags() == []

def test_add_single_tag(self):
"""Test adding a single tag."""
tags = TagsSet()
tags.add_tag('env', 'prod')
assert tags.get_tags() == ['env:prod']

def test_add_multiple_values_same_key(self):
"""Test adding multiple values to same key."""
tags = TagsSet()
tags.add_tag('env', 'prod')
tags.add_tag('env', 'dev')
assert tags.get_tags() == ['env:dev', 'env:prod']

def test_add_different_keys(self):
"""Test adding tags with different keys."""
tags = TagsSet()
tags.add_tag('env', 'prod')
tags.add_tag('app', 'web')
assert tags.get_tags() == ['app:web', 'env:prod']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably parameterize all these tests, they are doing the same thing in different ways.

The same with the tests below that test the same thing with different inputs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I prefer the simpler way, the duplicated code is minimal, and it’s not worth the added complexity and loss of readability that comes with using pytest.parametrize


def test_init_empty(self):
"""Test empty initialization."""
tags = TagsSet()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a fixture and we do not need to initialize an emtpy TagSet in every test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hot take (don’t hate me): I’m not a fan of fixtures. I just find it way clearer when we start with an empty tag set and build it up, nice and simple: Arrange, Act, Assert. Fixtures do all kinds of stuff behind the scenes, which makes it harder to follow sometimes. I’d use them here if setting up the TagsSet were more complex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants