Formatting a string in required format in Python

Question

I have a data in format:

id1 id2 value Something like

1   234  0.2
1   235  0.1

and so on. I want to convert it in json format:

{
  "nodes": [ {"name":"1"},  #first element
             {"name":"234"}, #second element
             {"name":"235"} #third element
             ] ,
   "links":[{"source":1,"target":2,"value":0.2},
             {"source":1,"target":3,"value":0.1}
           ]
}

So, from the original data to above format.. the nodes contain all the set of (distinct) names present in the original data and the links are basically the line number of source and target in the values list returned by nodes. For example:

   1 234 0.2

1 is in the first element in the list of values holded by the key "nodes" 234 is the second element in the list of values holded by the key "nodes"

Hence the link dictionary is {"source":1,"target":2,"value":0.2}

How do i do this efficiently in python.. I am sure there should be better way than what I am doing which is so messy :( Here is what I am doing from collections import defaultdict

def open_file(filename,output=None):
    f = open(filename,"r")
    offset = 3429
    data_dict = {}
    node_list = []
    node_dict = {}
    link_list = []
    num_lines = 0
    line_ids = []
    for line in f:
        line = line.strip()
        tokens = line.split()
        mod_wid  = int(tokens[1]) + offset


        if not node_dict.has_key(tokens[0]):
            d = {"name": tokens[0],"group":1}
            node_list.append(d)
            node_dict[tokens[0]] = True
            line_ids.append(tokens[0])
        if not node_dict.has_key(mod_wid):
            d = {"name": str(mod_wid),"group":1}
            node_list.append(d)
            node_dict[mod_wid] = True
            line_ids.append(mod_wid)


        link_d = {"source": line_ids.index(tokens[0]),"target":line_ids.index(mod_wid),"value":tokens[2]}
        link_list.append(link_d)
        if num_lines > 10000:
            break
        num_lines +=1


    data_dict = {"nodes":node_list, "links":link_list}

    print "{\n"
    for k,v in data_dict.items():
        print  '"'+k +'"' +":\n [ \n " 
        for each_v in v:
            print each_v ,","
        print "\n],"
    print "}"

open_file("lda_input.tsv")

By "efficiently" do you mean "using less CPU time" or "using less programmer time"? — abarnert
– abarnert, Commented Feb 6, 2013 at 1:35

abarnert · Accepted Answer · 2013-02-06 01:49:17Z

I'm assuming by "efficiently" you're talking about programmer efficiency—how easy it is to read, maintain, and code the logic—rather than runtime speed efficiency. If you're worried about the latter, you're probably worried for no reason. (But the code below will probably be faster anyway.)

The key to coming up with a better solution is to think more abstractly. Think about rows in a CSV file, not lines in a text file; create a dict that can be rendered in JSON rather than trying to generate JSON via string processing; wrap things up in functions if you want to do them repeatedly; etc. Something like this:

import csv
import json
import sys

def parse(inpath, namedict):
    lastname = [0]
    def lookup_name(name):
        try:
            print('Looking up {} in {}'.format(name, names))
            return namedict[name]
        except KeyError:
            lastname[0] += 1
            print('Adding {} as {}'.format(name, lastname[0]))
            namedict[name] = lastname[0]
            return lastname[0]
    with open(inpath) as f:
        reader = csv.reader(f, delimiter=' ', skipinitialspace=True)
        for id1, id2, value in reader:
            yield {'source': lookup_name(id1),
                   'target': lookup_name(id2),
                   'value': value}

for inpath in sys.argv[1:]:
    names = {}
    links = list(parse(inpath, names))
    nodes = [{'name': name} for name in names]
    outpath = inpath + '.json'
    with open(outpath, 'w') as f:
        json.dump({'nodes': nodes, 'links': links}, f, indent=4)

Blender · Accepted Answer · 2013-02-06 01:25:10Z

1

Don't construct the JSON manually. Make it out of an existing Python object with the json module:

def parse(data):
    nodes = set()
    links = set()

    for line in data.split('\n'):
        fields = line.split()

        id1, id2 = map(int, fields[:2])
        value = float(fields[2])

        nodes.update((id1, id2))
        links.add((id1, id2, value))

    return {
        'nodes': [{
            'name': node
        } for node in nodes],
        'links': [{
            'source': link[0],
            'target': link[1],
            'value': link[2]
        } for link in links]
    }

Now, you can use json.dumps to get a string:

>>> import json
>>> data = '1   234  0.2\n1   235  0.1'
>>> parsed = parse(data)
>>> parsed
    {'links': [{'source': 1, 'target': 235, 'value': 0.1},
  {'source': 1, 'target': 234, 'value': 0.2}],
 'nodes': [{'name': 1}, {'name': 234}, {'name': 235}]}
>>> json.dumps(parsed)
    '{"nodes": [{"name": 1}, {"name": 234}, {"name": 235}], "links": [{"source": 1, "target": 235, "value": 0.1}, {"source": 1, "target": 234, "value": 0.2}]}'

answered Feb 6, 2013 at 1:25

Blender

300k55 gold badges462 silver badges511 bronze badges

1 Comment

abarnert Over a year ago

No, this isn't the same output he asked for. Notice that his links are like {'source': 1, 'target': 2, 'value': 0.1}—in other words, each one is the (1-based) index of the name in the nodes list, not the actual name. So, you can't use a set here; you have to use a dict (or a list, but that would probably be more complicated).

Collectives™ on Stack Overflow

Formatting a string in required format in Python

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related