18

I have a large dictionary that has some large array data in it:

d = {'something': {'else': 'x'}, 'longnumbers': [1,2,3,4,54,6,67,7,7,8,8,8,6,4,3,3,5,6,7,4,3,5,6,54]}

The real dictionary has many more keys and a nested structure. When I use json.dump without indent, I get a compact, single-line output which is not readable. When I set indent, it puts newlines after every separator, including the arrays.

The numerical arrays are long and end up like this:

  "longnumbers": [
    1, 
    2, 
    3, 
    4, 
    54, 
    6, 
    67, 
    7, 
    7, 
    8, 
    8, 
    8, 
    6, 
    4, 
    3, 
    3, 
    5, 
    6, 
    7, 
    4, 
    3, 
    5, 
    6, 
    54
  ], 

Is there any way to get pretty-printed JSON with an indent level, but without placing newlines after array elements? For the example above, I'd like something like this:

{
  "longnumbers": [1, 2, 3, 4, 54, 6, 67, 7, 7, 8, 8, 8, 6, 4, 3, 3, 5, 6, 7, 4, 3, 5, 6, 54],
  "something": {
    "else": "x"
  }
}
3
  • 1
    You will most likely need to code this yourself. Commented Apr 10, 2012 at 22:46
  • @NiklasB. that's the conclusion I came to when I had a similar issue. Fortunately, the json library is implemented in Python, and not especially hard to read, which provides a good base for things. Commented Apr 10, 2012 at 23:18
  • I think a good way would be to just wrap json.dumps and only override the behaviour for handling dicts, passing through all the other element types. Commented Apr 10, 2012 at 23:20

3 Answers 3

15

I ended up just writing my own JSON serializer:

import numpy

INDENT = 3
SPACE = " "
NEWLINE = "\n"

def to_json(o, level=0):
    ret = ""
    if isinstance(o, dict):
        ret += "{" + NEWLINE
        comma = ""
        for k,v in o.iteritems():
            ret += comma
            comma = ",\n"
            ret += SPACE * INDENT * (level+1)
            ret += '"' + str(k) + '":' + SPACE
            ret += to_json(v, level + 1)

        ret += NEWLINE + SPACE * INDENT * level + "}"
    elif isinstance(o, basestring):
        ret += '"' + o + '"'
    elif isinstance(o, list):
        ret += "[" + ",".join([to_json(e, level+1) for e in o]) + "]"
    elif isinstance(o, bool):
        ret += "true" if o else "false"
    elif isinstance(o, int):
        ret += str(o)
    elif isinstance(o, float):
        ret += '%.7g' % o
    elif isinstance(o, numpy.ndarray) and numpy.issubdtype(o.dtype, numpy.integer):
        ret += "[" + ','.join(map(str, o.flatten().tolist())) + "]"
    elif isinstance(o, numpy.ndarray) and numpy.issubdtype(o.dtype, numpy.inexact):
        ret += "[" + ','.join(map(lambda x: '%.7g' % x, o.flatten().tolist())) + "]"
    elif o is None:
        ret += 'null'
    else:
        raise TypeError("Unknown type '%s' for json serialization" % str(type(o)))
    return ret
Sign up to request clarification or add additional context in comments.

2 Comments

This function is free to use under a BSD license.
This answer saved my day!
5

@jterrace's answer was written for Python 2, which has since deprecated for Python 3 with changes to types. So, with all due credit to his answer, I tweaked it a tad bit for my personal use & compatibility with Python 3, including support for tuples as lists:

import numpy

INDENT = 3
SPACE = " "
NEWLINE = "\n"

# Changed basestring to str, and dict uses items() instead of iteritems().
def to_json(o, level=0):
  ret = ""
  if isinstance(o, dict):
    ret += "{" + NEWLINE
    comma = ""
    for k, v in o.items():
      ret += comma
      comma = ",\n"
      ret += SPACE * INDENT * (level + 1)
      ret += '"' + str(k) + '":' + SPACE
      ret += to_json(v, level + 1)

    ret += NEWLINE + SPACE * INDENT * level + "}"
  elif isinstance(o, str):
    ret += '"' + o + '"'
  elif isinstance(o, list):
    ret += "[" + ",".join([to_json(e, level + 1) for e in o]) + "]"
  # Tuples are interpreted as lists
  elif isinstance(o, tuple):
    ret += "[" + ",".join(to_json(e, level + 1) for e in o) + "]"
  elif isinstance(o, bool):
    ret += "true" if o else "false"
  elif isinstance(o, int):
    ret += str(o)
  elif isinstance(o, float):
    ret += '%.7g' % o
  elif isinstance(o, numpy.ndarray) and numpy.issubdtype(o.dtype, numpy.integer):
    ret += "[" + ','.join(map(str, o.flatten().tolist())) + "]"
  elif isinstance(o, numpy.ndarray) and numpy.issubdtype(o.dtype, numpy.inexact):
    ret += "[" + ','.join(map(lambda x: '%.7g' % x, o.flatten().tolist())) + "]"
  elif o is None:
    ret += 'null'
  else:
    raise TypeError("Unknown type '%s' for json serialization" % str(type(o)))
  return ret

1 Comment

thank you and @jterrace ! This saves me a lot of hassle.
4

Ugh, should really be an option for specifying different indents for the two different JSON container types by now. An alternative approach if you want to stay compatible with the core Python JSON lib is to override the function (_make_iterencode() currently) in that lib that is responsible for handling indent.

Had a crack at reimplementation of _make_iterencode(). Only had to change a few lines to make the indent option, optionally take a tuple (hash-indent, array-indent). But unfortunately have to replace an entire _make_iterencode() which turns out to be pretty big and poorly decomposed. Anyway, following works for 3.4-3.6:

import sys
import json

dat = {"b": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], "a": 1, "c": "x"}
indent = 2
print(json.dumps(dat, indent=indent))

if sys.version_info.major == 3 and 4 <= sys.version_info.minor <= 6:
  import _make_iterencode
  json.encoder._make_iterencode = _make_iterencode._make_iterencode
  indent = (2, None)

print(json.dumps(dat, indent=indent))

Gives:

{
  "c": "x",
  "a": 1,
  "b": [
    1,
    2,
    3,
    4,
    5,
    6,
    7,
    8,
    9,
    10
  ]
}
{
  "c": "x",
  "a": 1,
  "b": [1,2,3,4,5,6,7,8,9,10]
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.