1

I have a file with the following json format in python:

{"header":{"a":"1","b":"1"}, 

"data":[{"a":"1", "b":{"ba":"b1","bb":"b2","bc":"b3"}, "c":{"ca":"x1","cb":"x2","cc":"x3"}, "d":"4"}, 

        {"a":"12", "b":{"ba":"12a","bb":"12ab","bc":"1ab"},"c":{"ca":"12z","cb":"12zz","cc":"12zzz"}, "d":"12"}
       ]}

I've written a csv parser without the nested 'b' and 'c' element, but am having difficulties trying to parse selected elements from 'b' and 'c' to my csv. Here's what I have so far:

#load json  
try:
    with open('tmp.p', 'rb') as f:
        myjson = json.load(f) 
except IOError:
    print("Error converting to json")

#write selected json to a csv output file      
out = open(savedpath, 'a+')
    try:
        #add or remove data to parse here    
        mydata = ('d','b','a')

        mycsv = csv.DictWriter(out, fieldnames=mydata, quoting=csv.QUOTE_ALL, extrasaction='ignore', lineterminator='\n')
        mycsv.writeheader()
        for row in myjson["data"]:
            mycsv.writerow(row)
    finally:
        out.close()

I've parsed the nested elements to a separate temp file:

# parse nested elements ##
try:
        #add or remove Port1/Port2 data to parse here
        myport = ('bb','ba')
        tmp3 = csv.DictWriter(t3, fieldnames=myport, quoting=csv.QUOTE_ALL, extrasaction='ignore', lineterminator='\n')
        tmp3.writeheader()
        tmp4 = csv.DictWriter(t4, fieldnames=myport, quoting=csv.QUOTE_ALL, extrasaction='ignore', lineterminator='\n')
        tmp4.writeheader()        
        #print myjson["data"][0]["b"]["bb"]
        #print myjson["data"][0]["c"]["bb"]

        for row in myjson["data"]:
            data1 = row["b"]
            data2 = row["c"]
            #print data1["bb"]

            tmp3.writerow(data1)
            tmp4.writerow(data2)
    finally:
        t3.close()
        t4.close()

But I am having troubles trying to join the data. I want the data to look like this in my csv:

#header
a:1
b:1
#data
a,d,ba,bc,ca,cc
1,4,b1,b3,x1,x3
12,12,12a,1ab,12z,12zzz

I'm stuck when trying to write my csv file. I think I'm over-thinking this, but since I thought using concatenating strings might work, but it didn't:

try:
        with open('tmp3.p', 'rb') as port1:
            with open('tmp4.p', 'rb') as port2:
                with open('tmp5.p', 'rb') as general:
                    for rport1 in port1:
                        for rport2 in port2:
                            for rgen in general:
                                rport1 = str.replace(rport1,"\n","")
                                rport2 = str.replace(rport2,"\n","")                        
                                rgen = str.replace(rgen,"\n","")                        
                                string = ("%s,%s,%s" % (rgen, rport1, rport2))
                                print string

I used DictWriter because I need the file to be parsed at a certain order. I know I'm defeating the purpose of using json when trying to combine the data, and it's a really bad practice, but yet I'm not sure how to proceed. Thank you in advance for helping...

6
  • 1
    As a side note, indenting everything 7 levels down makes it much harder to read. Unless you're using an old version of python, just use a single with for all three files, and look into whether you can simplify the loops or factor them into functions (neither may be appropriate, but it's worth trying).. Commented Oct 12, 2013 at 0:46
  • where is the bb and cb columns in output? Commented Oct 12, 2013 at 5:27
  • How should the data for the 'b' and 'c' elements be represented/formated into a single column each? The only way I can think of is as a string -- but even so, what format should they be presented in the string, json? Commented Oct 12, 2013 at 12:14
  • @abarnert, thanks for replying. I'm using python 2.7, and yes, I'll change to use single line to open 3 files. The simplifying of the loops is where the trouble is. With what I have now, the first for loop gets iterated with n number of repetitive 2nd and 3rd for loops, and so on... I don't know how to iterate though each file synchronously and concatenate the strings from the n-line of each file. Do you have an example in mind to help me get started? Commented Oct 13, 2013 at 7:59
  • @lucemia, thank you for your reply. I don't want to parse those elements to the csv. I'm using DictWriter's 'ignore' to do that. Commented Oct 13, 2013 at 8:01

1 Answer 1

2

I'm not 100% sure I understand what you want, but I think I can guess from this comment:

The simplifying of the loops is where the trouble is. With what I have now, the first for loop gets iterated with n number of repetitive 2nd and 3rd for loops, and so on... I don't know how to iterate though each file synchronously and concatenate the strings from the n-line of each file.

What you want is not nested iteration, but lockstep iteration. In other words, you don't want the first rport1 which each rport2, then the second rport1 with each rport2, and so on; you want the first rport1 with just the first rport2, then the second rport1 with just the second rport2, and so on.

If so, you're looking for zip.

I'll show the difference with a stripped-down example:

>>> seq1 = [1, 2, 3]
>>> seq2 = [4, 5, 6]
>>> 
>>> for i in seq1:
...     for j in seq2:
...         print i, j
1 4
1 5
1 6
2 4
2 5
2 6
3 4
3 5
3 6
>>> 
>>> for i, j in zip(seq1, seq2):
...     print i, j
1 4
2 5
3 6
Sign up to request clarification or add additional context in comments.

3 Comments

Awesome, thanks!! Now that I've a working solution, is this the best practice to parse a json file with these requirements: (1)parse only selected elements (including nested elements), (2)have the selected fields displayed in a specific order (not alphabetically sorted)? My current solution's sequence is (1)parse header, (2)store selected 'data' elements in a tmp file (referring to my snippets above), (3)store selected nested 'b' & 'c' elements to another tmp file, (4)open tmp files, join & write selected 'data'-level & nested elements to the output file, (5)delete all tmp files.
@user2793334: Normally, the best thing to do with a JSON file is to parse the whole thing with json.load, then pick it apart as a Python object, instead of trying to partially parse JSON manually. Sometimes this isn't appropriate (e.g., you have a 1GB JSON file, or it's not quite valid JSON because it has repeated keys within an object or depends on ordering), but start with the easy way unless you know it won't work.
Got it. I've written a function to validate my JSON format in the beginning of my script. The JSON will be uploaded to the DB, and I'm using the same JSON to manipulate the output csv file for non-techies. Thank you :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.