1

I'm trying to merge two JSON files into a single JSON using python.

File1:

{
    "key1":    "protocol1",
    "key2":     [
            {
                    "name": "user.name",
                    "value": "[email protected]"
            },
            {
                    "name": "user.shortname",
                    "value": "user"
            },
            {
                    "name": "proxyuser.hosts",
                    "value": "*"
            },
            {
                    "name": "kb.groups",
                    "value": "hadoop,users,localusers"
            },        
            {
                    "name": "proxy.groups",
                    "value": "group1, group2, group3"
            },
            {
                    "name": "internal.user.groups",
                    "value": "group1, group2"
            }
    ]
}

File2:

{
    "key1":    "protocol1",
    "key2":     [
            {
                    "name": "user.name",
                    "value": "[email protected]"
            },
            {
                    "name": "user.shortname",
                    "value": "user"
            },
            {
                    "name": "proxyuser.hosts",
                    "value": "*"
            },
            {
                    "name": "kb.groups",
                    "value": ""
            },        
            {
                    "name": "proxy.groups",
                    "value": "group3, group4, group5"
            },
            {
                    "name": "internal.groups",
                    "value": "none"
            }
    ]
}

Final expected result:

{
    "key1":    "protocol1",
    "key2":     [
            {
                    "name": "user.name",
                    "value": "[email protected], [email protected]"
            },
            {
                    "name": "user.shortname",
                    "value": "user"
            },
            {
                    "name": "proxyuser.hosts",
                    "value": "*"
            },
            {
                    "name": "kb.groups",
                    "value": "hadoop,users,localusers"
            },        
            {
                    "name": "proxy.groups",
                    "value": "group1, group2, group3, group4, group5"
            },
            {
                    "name": "internal.user.groups",
                    "value": "group1, group2"
            },
            {
                    "name": "internal.groups",
                    "value": "none"
            }
    ]
}

I need to merge based on below rules:

  1. If the 'name' key within the list(key2) match in both the files then concatenate the values.

    e.g.

    File1:

    "key2": [{"name" : "firstname", "value" : "bob"}]
    

    File2:

    "key2": [{"name" : "firstname", "value" : "charlie"}]
    

    Final output:

    "key2": [{"name" : "firstname", "value" : "bob, charlie"}]
    

Some considerations while appending the values:

  • If both files contain duplicate value(s) in 'value', final result should only be the union of the values.

  • If any of 'value' contains ' * ', then final value should be ' * '.

    1. If 'name' key in 2nd JSON file is not present in 1st file, add it to first file.

I've written a python script to load the two JSON files and merge them but it seems to just concatenate everything into the first JSON file.

    def merge(a, b):
        "merges b into a"
        for key in b:
            if key in a:# if key is in both a and b
                if key == "key1":
                    pass
                elif key == "key2":
                    for d1, d2 in zip(a[key], b[key]):
                        for key, value in d1.items():
                            if value != d2[key]:
                                a.append({"name": d2[key], "value": d2["value"]})
                else:
                  a[key] = a[key]+ b[key]
            else: # if the key is not in dict a , add it to dict a
                a.update({key:b[key]})
        return a

Can someone point out how I can compare the value for the "name" section with the list for key2 in both the files and concatenate the values in "value"?

2
  • Once you read and deserialized the data, it has nothing to do with JSON any longer, so you could well reduce your question. Commented Sep 20, 2018 at 5:41
  • @Ulrich Eckhardt Thanks for pointing it out, but doesn't writing back to the first file serialize it back? Commented Sep 21, 2018 at 17:03

3 Answers 3

2

Here's a solution that runs in linear time using a dictionary to quickly look up an item in a given a name key. Dictionary b's key2 list is iterated through once and a modified in constant time as required. Sets are used to eliminate duplicates and handle asterisks.

def merge(a, b):
    lookup = {o['name']: o for o in a['key2']}

    for e in a['key2']:
        e['value'] = set([x.strip() for x in e['value'].split(",")])

    for e in b['key2']:
        if e['name'] in lookup:
            lookup[e['name']]['value'].update([x.strip() for x in e['value'].split(",")])
        else:
            e['value'] = set([x.strip() for x in e['value'].split(",")])
            a['key2'].append(e)

    for e in a['key2']:
        if "*" in e['value']:
            e['value'] = "*"
        else:
            e['value'] = ", ".join(sorted(list(e['value'])))

Sample output:

key1:
    protocol1
key2:
    {'name': 'user.name', 'value': '[email protected], [email protected]'}
    {'name': 'user.shortname', 'value': 'user'}
    {'name': 'proxyuser.hosts', 'value': '*'}
    {'name': 'kb.groups', 'value': ', hadoop, localusers, users'}
    {'name': 'proxy.groups', 'value': 'group1, group2, group3, group4, group5'}
    {'name': 'internal.user.groups', 'value': 'group1, group2'}
    {'name': 'internal.groups', 'value': 'none'}
Sign up to request clarification or add additional context in comments.

8 Comments

Thank you ggorlen and @Serge Ballesta for your answers. The code works well to compare the "name" value, but for following case it needs to be improvised: if "value" : " * ", then when we merge the value for same "name"; ideally it should only print a single " * " but right now it prints " * , * ".
I don't 100% follow--you're saying you want the value list to be unique items only? Please update your question with the new requirement and I'll update my answer when I get a moment.
Updated the original question to be more clearer, Thank you!
this line doesn't append the values, instead it overwrites lookup[e['name']]['value'] = e['value'] I did try appending, although one problem I face is duplicate values. Since there could be values in "value" that match in both lists. I tried using set() to eliminate duplicates but get the error the set is not JSON serializable
I'm not understanding your updated version. It's all clear except for the proxy.groups key. Both File1 and File 2 have values group1, group2, group3. I would expect output of the merge to be group1, group2, group3, but you somehow are looking for group1, group2, group3, group4, group5, which is strange both because two groups were made up out of the blue and if you are interested in numerical group extensions, I'd expect 6 groups to be present rather than 5. Please explain how you're getting this transformation. I updated my code in the meantime to handle everything but this edge case.
|
1

Order of elements in a["key2"] and b["key2"] is not guaranteed to be the same, so you should build a mapping from the "name" value to the index in a["key2"], and then browse b["key2"] comparing each "name" value to that dict.

Code could be:

def merge(a, b):
    "merges b into a"
    for key in b:
        if key in a:# if key is in both a and b
            if key == "key2":
                # build a mapping from names from a[key2] to the member index
                akey2 = { d["name"]: i for i,d in enumerate(a[key]) }
                for d2 in b[key]:      # browse b["key2"]
                    if d2["name"] in akey2:   # a name from a["key2"] matches
                        a[key][akey2[d2["name"]]]["value"] += ", " + d2["value"]
                    else:
                        a[key].append(d2)     # when no match
        else: # if the key is not in dict a , add it to dict a
            a[key] = b[key]
    return a

You can then test it:

a = {"key1":    "value1",
     "key2": [{"name" : "firstname", "value" : "bob"}]
     }
b = {"key1":    "value2",
     "key2": [{"name" : "firstname", "value" : "charlie"},
          {"name" : "foo", "value": "bar"}]
     }
merge(a, b)

pprint.pprint(a)

gives as expected:

{'key1': 'value1',
 'key2': [{'name': 'firstname', 'value': 'bob, charlie'},
          {'name': 'foo', 'value': 'bar'}]}

5 Comments

@ggorlen: I have only seen your answer after posting mine, and then realized they were the same. I leave this one here because it integrates the code in OP's merge function. If you do it in your answer and ping me in a comment, I will remove this one.
can we avoid appending duplicate values here (values already present in a)?
@TusharKarkera: Yes. It would be enough to split both value strings into sets, do the union of both sets and then join the result. Too lazy to do it... Anyway it would require a precise specification of the value format.
I've added the precise specifications for the input formats and how they should be merged in my edit for original question. Let me know if anything is not clear. Thanks for your help!
@TusharKarkera: As I've already said, I'm too lazy for that, and ggorlen's answer already contains what I could have done
0

Just loop through the keys if its not in the new dict add it if it is merge the two values

d1 = {"name" : "firstname", "value" : "bob"}
d2 = {"name" : "firstname", "value" : "charlie"}
d3 = {}

for i in d1:
    for j in d2:
        if i not in d3:
            d3[i] = d1[i]
        else:
            d3[i] = '{}, {}'.format(d1[i], d2[i])

print(d3)
(xenial)vash@localhost:~/python/stack_overflow$ python3.7 formats.py 
{'name': 'firstname, firstname', 'value': 'bob, charlie'}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.