2

while reading a text file with values, i want to create variables on the fly.

I've got the values stored in a list of lists:

values = list(content[startline].split() for i in range(_n_lines))

content is a list of the lines

The variable names are stored in a tuple of tuples, depending on the block I'm reading:

variable_names = (
    ('_idx_brg', 'stn', 'stn_rel', '_n_lines', '_brg_type'),
    ('D', 'L', 'Cb', 'visc'),
    ('stiff', 'damp'),
    ('damp_model', ''))

By default I convert the values into float:

for irow,row in enumerate(variable_names):
    for icol,col in enumerate(row):
        if col:
            val = float(values[irow][icol])
            setattr(self, col, val)

Here is my issue:

In some cases i need a different type and I want to avoid another list of lists. Is there a clean and short way to provide a type for each variable? I thought about putting the info into variable_names, but that just seems wrong to me.

I would be glad for any advices. Also for the part that I'm already using.

*edit @Rory

Here is a sample input text block for the stated example

6 28 0 4 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Some comments here
12.527 4.6 0.0365 3.5 0 0 0 0 0 0
0 0  0  0 0 0 0 0 0 0
0  0  0  0 0 0 0 0 0 0
0  0  0  0 0 0 0 0 0 0

The input file has several blocks like this and of course some with another format. The identification of the blocks is done elsewhere in the script.

As you can see, I don't always read the whole block.

6
  • 5
    Python uses dynamic typing, you don't need to provide the types of variables. Commented Jun 12, 2020 at 16:57
  • 2
    @Barmar is right but as you are reading a plain text file, the entries in values will be strings. Converting them might be easier when you use them, or by calling some "sanitizing" function that has actual type-wise lists of variables. Maybe you can show a bit more of what you are doing with this afterwards to decide for a good solution. Did you as an alternative consider more automatic storage options like pandas in combination with json/feather or such? Commented Jun 12, 2020 at 17:15
  • @David, unfortunately i'm not so familier with pandas and json or feather. But i will check up on it. Commented Jun 12, 2020 at 18:09
  • My script is relatively long. The text file i read is an input file for a simulation software. The script is sort of a wrapper to provide an optimization software with the inputs. The optimization software reads and manipulates the input file and runs the simulation software. The input file is quite big and consists of several blocks. For each block i want to collect every parameter which has to be read as a specific type. Especially integers are important for indexing. Commented Jun 12, 2020 at 18:22
  • Are you provided with the input files or do you create them yourself? Commented Jun 12, 2020 at 20:11

2 Answers 2

1

Well, without getting into the details of your nesting, you could attach a variable type to the name by using a tuple.

I've done this on 2 of your variable names : ('_idx_brg',str), ('stn','int')

Rather than using zip, you'll need to hook that back up to your nested tuples and you'll also need to add error handling in case the string value from the file doesn't fit the expected variable type.

import builtins
import pdb

def set_attr(tgt, names, values):

    try:
        for name, value in zip(names, values):
            cls_ = None
            if isinstance(name, str):
                setattr(tgt, name, float(value))
            elif isinstance(name, tuple):
                name, cls_ = name
                if callable(cls_):
                    setattr(tgt, name, cls_(value))
                elif isinstance(cls_, str):
                    cls_ = globals().get(cls_) or getattr(builtins, cls_)
                    setattr(tgt, name, cls_(value))
                else:
                    raise ValueError("variable types have to be a string or callable like `int`,`float`, etc")
    except (ValueError,TypeError,AttributeError) as e: 
        print(f"  somethings wrong:\n{dict(exception=e, name=name, cls_=cls_, value=value)}")
        #raise 

    #pragma: no cover pylint: disable=unused-variable
    except (Exception,) as e: 
        if 1: 
            pdb.set_trace()
        raise

class Foo:
    pass

variable_names = ('_idx_brg', 'stn', 'stn_rel', '_n_lines', '_brg_type')
values = (1.0, 1, 1.2, 1.3, 1.4, 1.5)

foo = Foo()

print("\n\nsetting for foo")
set_attr(foo, variable_names, values) 

print("\n\nfoo:", vars(foo))

variable_names2 = (('_idx_brg',str), ('stn','int'), 'stn_rel', '_n_lines', ('_brg_type','xxx'))

bar = Foo()

print("\n\nsetting for bar:")
set_attr(bar, variable_names2, values) 

print("\n\nbar:", vars(bar))

output:



setting for foo


foo: {'_idx_brg': 1.0, 'stn': 1.0, 'stn_rel': 1.2, '_n_lines': 1.3, '_brg_type': 1.4}


setting for bar:
  somethings wrong:
{'exception': AttributeError("module 'builtins' has no attribute 'xxx'"), 'name': '_brg_type', 'cls_': 'xxx', 'value': 1.4}


bar: {'_idx_brg': '1.0', 'stn': 1, 'stn_rel': 1.2, '_n_lines': 1.3}
                   👆           👆

You could even build your own classes.

class Myclass:
   def __init__(self, value):
      self.value = value

#part of your name/type tuples...
(('somevar', Myclass), ('_idx_brg',str)...)

edit re. yaml:

I am not testing this so you may have to adjust a bit, esp around the exact yaml to get a dict with a nested varnames dict in it.

---
varnames:
  _idx_brg: str
  stn : int
from yaml import safe_load as yload
with open("myconfig.yaml") as fi:
  config = yload(li)

mapping = {}

#the yaml is all strings right now
# map it to actual types/classes
for name, type_ in config["varnames"].items():
    cls_ = globals().get(type_) or getattr(builtins, type_)
    mapping[name] = cls_

#using it
for name, value in zip(names, values):

    #fall back to `float` if there is no special-case for this varname
    cls_ = mapping.get(name, float)
    setattr(tgt, name, cls_(value))

Now, this does rely on all instances of a given variable name having the same type no matter where in the data hierarchy, but that's just best practices.

The other thing is that, if I have one area that looks a bit fishy/brittle to me, it is your complex nesting with tuples of values and names that somehow need to be always in synch. Much more so than your basic requirement to load text data (whose format is not under your control) but then format it different ways. I'd work at getting your names to flow more naturally with the data, somehow. Maybe try to identify incoming data by record types and then assign a mapping class to it? Same thing as what you're doing, really, just not relying on complex nesting.

Or maybe, going from your remark about row, column, you could put all that into the yaml config file as well, load that into a mapping data structure and explicitly use indices rather than nested loops? Might make your code a lot simpler to reason about and adjust for data changes.

There are also interesting things in the Python data parsing space like Pydantic. Might or might not be helpful.

Sign up to request clarification or add additional context in comments.

1 Comment

I like your solution. I just don't need the possibility to pass the type as a string. Is there a way to use yaml (like Rory suggested) for my problem? I got no idea. Otherwise I will stick to your solution @JL Peyret
0

From your last paragraph I get the impression that you can control the file format. That being the case, I'd suggest you consider yaml.

With YAML, as well as numbers, strings, arrays and objects, yaml supports custom classes.

The following would indicate that you want thing to be a Meh object. Check out the pyyaml docs for more detail.

thing: !Meh
    foo: bar
    ping: echo

I also get the impression that you're essentially writing your own parser for your own format. It's generally better to use a battle hardened off the shelf parser, with a battle hardened proven format. It's one less avenue for bugs, and you can stand on the shoulders of the giants who wrote the parser, and fixed any bugs that were found over the years.

4 Comments

generally speaking, I agree with you that, if the OP can use an existing parser, that's much better. I had not noticed that possibility when answering. However... thing: !Meh makes me think you suggest using yaml.load rather than yaml.safe_load which is a really bad idea if you don't fully the control yaml file and even then is iffy (say a bad actor writes to an unprotected, because non-code, yaml). Still, on the whole, using predefined formats is a good suggestion, as long as you don't use unsafe features.
I cannot control the file format. And I don't know how to use yaml for my problem. It seems to make my problem more complicated. It may be important to mention, that the format of variable_names is used later on to get te specific row and column of the variable to write it back to the exact same position.
No, definitely use safe_load(). The Meh class can however be set up such that it's recognised by safe_load(). I believe it's a case of marking it safe by deriving from a certain base class, and setting a certain property.
ah, nice, wasn't aware there was an option to safe_load custom data types. I'll keep that in mind for the future.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.