2

I have a file with objects in it like below.

Eg: Input.txt

1. {"Cp": "1000", "Af": "CBS", "Bp": "150", "Vt": "channel", "Ti": "Q2", "Cs": "K11HE-D", "Tg": "BROADCAST<>LOCAL<>HD", "Fd": "dish#K11HE-D", "Pi": "CHAF2", "Gi": "RV1688668060"}

2. {"Cp": "1000", "Af": "CBS", "Bp": "150", "Vt": "channel", "Ti": "Q2", "Cs": "K08JV-D", "Tg": "BROADCAST<>LOCAL<>HD", "Fd": "dish#K08JV-D", "Pi": "CHAF2", "Gi": "RV1714277379"}

3. {"Cp": "1000", "Af": "CBS", "Bp": "150", "Vt": "channel", "Ti": "ABCD", "Cs": "K20LT-D", "Tg": "BROADCAST<>LOCAL<>HD", "Fd": "dish#K20LT-D", "Pi": "CHAF2", "Gi": "RV1714278093"}

4. {"Cp": "1000", "Af": "CBS", "Bp": "150", "Vt": "channel", "Ti": "Q2", "Cs": "K08OW-D", "Tg": "BROADCAST<>LOCAL<>HD", "Fd": "dish#K08OW-D", "Pi": "CHAF2", "Gi": "RV1714277380"}

The file contains thousands of rows.

I want to group all those json objects in the file, which has the same value for the key " Ti ".

Below is an example to elaborate more on my requirement.

You can see from the sample file above, there are 3 lines with the same value of for key "Ti". That is line 1, 2 and 4. They have all the value for "Ti" as "Q2".

I need a way to join those JSON objects, and I want to create an output file, that looks like below.

Eg: Output.txt

1. {"Cp": "[1000, 1000, 1000]", "Af": "['CBS', 'CBS', 'CBS']", "Bp": "[150, 150, 150]", "Vt": "['channel', 'channel', 'channel']", "Ti": "['Q2', 'Q2', 'Q2']", "Cs": "['K11HE-D', 'K08JV-D', 'K08OW-D' ]", "Tg": "['BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD, 'BROADCAST<>LOCAL<>HD]", "Fd": "['dish#K11HE-D', 'dish#K08JV-D', 'dish#K08OW-D']", "Pi": "['CHAF2','CHAF2','CHAF2']", "Gi": "['RV1688668060', 'RV1714277379', 'RV1714277380']"}

2. {"Cp": "[1000, 1000, 1000]", "Af": "['CBS', 'CBS', 'CBS']", "Bp": "[150, 150, 150]", "Vt": "['channel', 'channel', 'channel']", "Ti": "['Q2', 'Q2', 'Q2']", "Cs": "['K11HE-D', 'K08JV-D', 'K08OW-D' ]", "Tg": "['BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD, 'BROADCAST<>LOCAL<>HD]", "Fd": "['dish#K11HE-D', 'dish#K08JV-D', 'dish#K08OW-D']", "Pi": "['CHAF2','CHAF2','CHAF2']", "Gi": "['RV1688668060', 'RV1714277379', 'RV1714277380']"}

3. {"Cp": "1000", "Af": "CBS", "Bp": "150", "Vt": "channel", "Ti": "ABCD", "Cs": "K20LT-D", "Tg": "BROADCAST<>LOCAL<>HD", "Fd": "dish#K20LT-D", "Pi": "CHAF2", "Gi": "RV1714278093"}

4. {"Cp": "[1000, 1000, 1000]", "Af": "['CBS', 'CBS', 'CBS']", "Bp": "[150, 150, 150]", "Vt": "['channel', 'channel', 'channel']", "Ti": "['Q2', 'Q2', 'Q2']", "Cs": "['K11HE-D', 'K08JV-D', 'K08OW-D' ]", "Tg": "['BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD, 'BROADCAST<>LOCAL<>HD]", "Fd": "['dish#K11HE-D', 'dish#K08JV-D', 'dish#K08OW-D']", "Pi": "['CHAF2','CHAF2','CHAF2']", "Gi": "['RV1688668060', 'RV1714277379', 'RV1714277380']"}

Please let me know, how can I achieve this.

1
  • The easiest way I could think of is You can load the json into a dataframe and do a value manipulation by combining rows which have same "Ti" values and then convert the dataframe back to json. That would be easier than trying to manipulate the json as it is. It could be better if you share the raw json contents rather than formatting it in the question and also elaborate on what have you tried on your part Commented Feb 19, 2020 at 7:43

1 Answer 1

1

You need to:

  1. convert string into dictionary
  2. collect Ti values
  3. loop over dictionary elements and collect data based on Ti
import re

raw_data = open('test.txt', 'r')

data_list = raw_data.read().splitlines()
data_list = list(filter(None, data_list))

# create list of Ti values
ti_list = []
for item in data_list:
    number = re.search('\d+\.', item).group(0)
    row = re.sub('\d+\. ', '', item)
    row_dictionary = eval(row)
    ti_list.append(row_dictionary.get('Ti'))


# collect data into new dictionary
data = {}
i = 1
for ti in ti_list:
    raw = {}
    for item in data_list:
        number = re.search('\d+\.', item).group(0)
        row = re.sub('\d+\. ', '', item)
        row_dictionary = eval(row)

        if row_dictionary.get('Ti') == ti:
            for key, value in row_dictionary.items():
                raw.setdefault(key, []).append(value)

    data[str(i)+'.'] = raw
    i += 1

Output:

1. {'Cp': ['1000', '1000', '1000'], 'Af': ['CBS', 'CBS', 'CBS'], 'Bp': ['150', '150', '150'], 'Vt': ['channel', 'channel', 'channel'], 'Ti': ['Q2', 'Q2', 'Q2'], 'Cs': ['K11HE-D', 'K08JV-D', 'K08OW-D'], 'Tg': ['BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD'], 'Fd': ['dish#K11HE-D', 'dish#K08JV-D', 'dish#K08OW-D'], 'Pi': ['CHAF2', 'CHAF2', 'CHAF2'], 'Gi': ['RV1688668060', 'RV1714277379', 'RV1714277380']}
2. {'Cp': ['1000', '1000', '1000'], 'Af': ['CBS', 'CBS', 'CBS'], 'Bp': ['150', '150', '150'], 'Vt': ['channel', 'channel', 'channel'], 'Ti': ['Q2', 'Q2', 'Q2'], 'Cs': ['K11HE-D', 'K08JV-D', 'K08OW-D'], 'Tg': ['BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD'], 'Fd': ['dish#K11HE-D', 'dish#K08JV-D', 'dish#K08OW-D'], 'Pi': ['CHAF2', 'CHAF2', 'CHAF2'], 'Gi': ['RV1688668060', 'RV1714277379', 'RV1714277380']}
3. {'Cp': ['1000'], 'Af': ['CBS'], 'Bp': ['150'], 'Vt': ['channel'], 'Ti': ['ABCD'], 'Cs': ['K20LT-D'], 'Tg': ['BROADCAST<>LOCAL<>HD'], 'Fd': ['dish#K20LT-D'], 'Pi': ['CHAF2'], 'Gi': ['RV1714278093']}
4. {'Cp': ['1000', '1000', '1000'], 'Af': ['CBS', 'CBS', 'CBS'], 'Bp': ['150', '150', '150'], 'Vt': ['channel', 'channel', 'channel'], 'Ti': ['Q2', 'Q2', 'Q2'], 'Cs': ['K11HE-D', 'K08JV-D', 'K08OW-D'], 'Tg': ['BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD'], 'Fd': ['dish#K11HE-D', 'dish#K08JV-D', 'dish#K08OW-D'], 'Pi': ['CHAF2', 'CHAF2', 'CHAF2'], 'Gi': ['RV1688668060', 'RV1714277379', 'RV1714277380']}
Sign up to request clarification or add additional context in comments.

11 Comments

Thank you for the quick response.
@KiranDas it means that you data row don't contains number.
You code works just fine. But, I see that you are considering the line number also. In reality, my file doesn't have line numbers. I just used line number to highlight the explanation. Can you please suggest, how can we achieve this without considering the line numbers. Thanks a lot for your help.
Below 3 lines show exactly, how the lines are present in the file {"Cp": "1000", "Af": "CBS", "Bp": "150", "Vt": "channel", "Ti": "Q2", "Cs": "K11HE-D", "Tg": "BROADCAST<>LOCAL<>HD", "Fd": "dish#K11HE-D", "Pi": "CHAF2", "Gi": "RV1688668060"} {"Cp": "1000", "Af": "CBS", "Bp": "150", "Vt": "channel", "Ti": "Q2", "Cs": "K08JV-D", "Tg": "BROADCAST<>LOCAL<>HD", "Fd": "dish#K08JV-D", "Pi": "CHAF2", "Gi": "RV1714277379"} {"Cp": "1000", "Af": "CBS", "Bp": "150", "Vt": "channel", "Ti": "ABCD", "Cs": "K20LT-D", "Tg": "BROADCAST<>LOCAL<>HD", "Fd": "dish#K20LT-D", "Pi": "CHAF2", "Gi": "RV1714278093"}
Yes Correct. Data row doesn't contain number.. However, If i insert the line number to all the rows, that should solve the issue. Thanks a lot for the help. :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.