Add TagsSet util class in the base package #20986

NouemanKHAL · 2025-08-08T10:16:21Z

What does this PR do?

Motivation

Review checklist (to be filled by reviewers)

Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

codecov · 2025-08-08T11:03:51Z

Codecov Report

❌ Patch coverage is 98.23529% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.41%. Comparing base (3c6079e) to head (02c2258).
⚠️ Report is 69 commits behind head on master.

Additional details and impacted files

Flag	Coverage Δ
activemq	`?`
cassandra	`?`
confluent_platform	`?`
hive	`?`
hivemq	`?`
hudi	`?`
ignite	`?`
jboss_wildfly	`?`
kafka	`?`
presto	`?`
solr	`?`
tomcat	`?`
weblogic	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

AAraKKe

Thanks for creating this! I added a some comments but I would say the most important one is to agree on the api. I feel that having different entry points can cause the intention of this change, avoiding leaking repeated tags when they should be unique, to be bypassed.

AAraKKe · 2025-08-08T10:45:42Z

datadog_checks_base/datadog_checks/base/utils/tagging.py

+        if key not in self._data:
+            self._data[key] = set()
+        self._data[key].add(value)


We can simplify this kind of operation with

Suggested change

if key not in self._data:

self._data[key] = set()

self._data[key].add(value)

self._data.setdefault(key, set()).add(value)

Or better yet, define self._data as a defaultdict(set) from collections. Every time you try to access a key that does not exist will return an empty set.

I didn't want to use the defaultdict because of the silent KeyError that would just quietly return a default value.
For simplicity, keep it again in 3 line seemed simpler and easier to read as well.

AAraKKe · 2025-08-08T10:54:07Z

datadog_checks_base/datadog_checks/base/utils/tagging.py

+        tags_list: list[tuple[str, str]] = []
+        for key, values in self._data.items():
+            for val in values:
+                tags_list.append((key, val))


List comprehension is better for these kind of operations and type is inferred

tags_list = [(key, val) for key, values in self._data.items() for val in values]

I think list comprehensions in such scenarios could can reduce the readability of the code, not worth saving 3 lines of code in my opinion.

AAraKKe · 2025-08-08T10:54:49Z

datadog_checks_base/datadog_checks/base/utils/tagging.py

+    def get_standalone_tags(self) -> set[str]:
+        return self.get_tag('')
+
+    def _get_tag_tuples(self, sort: bool = True) -> list[tuple[str, str]]:


Since we are giving the user the option to sort, we should give them the option to sort ascending or descending as well. WDYT?

I was waiting to check what we do with the tags arg that we pass to the base checks methods, maybe we already sort them and preprocess them somewhere.

AAraKKe · 2025-08-08T11:01:29Z

datadog_checks_base/datadog_checks/base/utils/tagging.py

+        self._data.clear()
+
+    def __iter__(self) -> Iterator[tuple[str, str]]:
+        """Allow iteration over tags: for tag in ts or list(ts)."""


What does this mean?

for tag in ts or list(ts)

It was just a comment to explain how the iter function works, I'll clarify it / remove it.

AAraKKe · 2025-08-08T11:04:21Z

datadog_checks_base/tests/base/checks/test_tagging.py

+from datadog_checks.base.utils.tagging import TagsSet
+
+
+class TestTagsSet:


Can we get rid of the class? Testing with classes as namespaces is a left over from the unittest framework from python core but not necessary for pytest which relies on modules for namespacing.

Adding a class adds unnecessary burden to build the class and on reporting in which each line now will include the module the test is in and the name of the class. Pytest supports classes more as a backward compatibility feature to run tests written with the unittest freamework.

Some tests in the repo have classes but we are not using this anymore in newer implementations. WDYT?

Yes, I just kept the same testing style as the codebase in the base check. I'll look into it!
Are there any drawbacks of having mixed styles?

AAraKKe · 2025-08-08T11:08:33Z

datadog_checks_base/datadog_checks/base/utils/tagging.py

+        """
+        if not key:
+            raise ValueError("Tag key cannot be empty. Use add_standalone_tag() for standalone values.")
+        self._data[key] = {value}


Should we error here? I feel that not erroring can cause unwanted effects in which users end up overriding tags. Maybe give an option to override that by default is set to false. If override is true then we override instead of erroring.

Another question is, if the intention is to prevent the leaks we have seen in the past. Shouldn't we have a single add method the user interacts with to enforce uniqueness and only allow non-unique tags when the user specifically request it? I feel that having aadd_tag method can cause the user to go stright to that one without realizing there is a more specific add_unique_tag method if they want the tag to be unique. It is more common for a tag to be unique than not.

With a single add_tag method that enforces uniqueness, fi the user wants to add non unique tags, the code will break during testing and they will be force to add an unique flag set to False.

AAraKKe · 2025-08-08T11:09:22Z

datadog_checks_base/tests/base/checks/test_tagging.py

+    def test_init_empty(self):
+        """Test empty initialization."""
+        tags = TagsSet()
+        assert tags.get_tags() == []
+
+    def test_add_single_tag(self):
+        """Test adding a single tag."""
+        tags = TagsSet()
+        tags.add_tag('env', 'prod')
+        assert tags.get_tags() == ['env:prod']
+
+    def test_add_multiple_values_same_key(self):
+        """Test adding multiple values to same key."""
+        tags = TagsSet()
+        tags.add_tag('env', 'prod')
+        tags.add_tag('env', 'dev')
+        assert tags.get_tags() == ['env:dev', 'env:prod']
+
+    def test_add_different_keys(self):
+        """Test adding tags with different keys."""
+        tags = TagsSet()
+        tags.add_tag('env', 'prod')
+        tags.add_tag('app', 'web')
+        assert tags.get_tags() == ['app:web', 'env:prod']


We can probably parameterize all these tests, they are doing the same thing in different ways.

The same with the tests below that test the same thing with different inputs.

Honestly, I prefer the simpler way, the duplicated code is minimal, and it’s not worth the added complexity and loss of readability that comes with using pytest.parametrize

AAraKKe · 2025-08-08T11:09:51Z

datadog_checks_base/tests/base/checks/test_tagging.py

+
+    def test_init_empty(self):
+        """Test empty initialization."""
+        tags = TagsSet()


This could be a fixture and we do not need to initialize an emtpy TagSet in every test.

Hot take (don’t hate me): I’m not a fan of fixtures. I just find it way clearer when we start with an empty tag set and build it up, nice and simple: Arrange, Act, Assert. Fixtures do all kinds of stuff behind the scenes, which makes it harder to follow sometimes. I’d use them here if setting up the TagsSet were more complex.

NouemanKHAL added 4 commits August 7, 2025 14:54

add TagsSet in base utils

c577481

update interface to return tags in the formatted format as a list

44d0fd4

add add_standalone_tag method

2f1c853

add get_standalone_tags() method, raise ValueException on empty keys

02c2258

datadog-agent-integrations-bot bot added the base_package label Aug 8, 2025

AAraKKe requested changes Aug 8, 2025

View reviewed changes

		from datadog_checks.base.utils.tagging import TagsSet


		class TestTagsSet:

Add TagsSet util class in the base package #20986

Are you sure you want to change the base?

Add TagsSet util class in the base package #20986

Uh oh!

Conversation

NouemanKHAL commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Review checklist (to be filled by reviewers)

Uh oh!

codecov bot commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

AAraKKe left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NouemanKHAL commented Aug 8, 2025 •

edited

Loading

codecov bot commented Aug 8, 2025 •

edited

Loading