1

I have a comma separated CSV file with headers and want to include them in the table

Input:

header,word1,word2,word3
supercalifragi,black,white,red
adc,bad,cat,love

Output:

| header         | word1 | word2 | word3 |
| -------------- | ----- | ----- | ----- |
| supercalifragi | black | white | red   |
| adc            | bad   | cat   | love  |

I need to include the headers and I need to take into account the length of the words in the input file so that the finished table formats correctly

Here is the updated code:

function pr(){
    for(i=1;i<=NF;i++)
        printf "| %-"len[i]+1"s",$i;
    printf "|\n"
}
NR==FNR{
    for(i=1;i<=NF;i++)
        if(len[i]<length($i)){
            len[i]=length($i);
            word[i]=$i
        }next 
}{pr()}
FNR==1{
    for(i=1;i<=NF;i++){
        gsub(/./,"-",word[i]);
        $i=word[i]};
    pr() 
}

``

1
  • Some thoughts. You don't need to terminate FS, OFS with a semicolon here. In addition, this , OFS will print two pipes instead of one (or should): the comma stands for OFS. printf "\n" should read print "" if you desire a return carriage. It won't print your desired output, though. Finally, #NR=1: no. That part of your code is executed for all records, so it starts with NR=1 and then is NR=2, NR=3 and so on. You might want to read the manual Commented Apr 20, 2018 at 15:29

3 Answers 3

3

I took the freedom of rewriting the entire code from scratches. This should work:

BEGIN {
    FS=","
    OFS=" | "
    for (i=1; i<=NF; i++) {
        transientLength[i] = 0
    }
}

{
    if(NR==1) {
    # read headers
        for (i=0; i<NF; i++) {
            headers[i] = $(i+1)
            transientLength[i] = (length($(i+1))>=transientLength[i] ? length($(i+1)) : transientLength[i])
        }
    } else {
        for (i=0; i<NF; i++) {
            fields[NR][i] = $(i+1)
            transientLength[i] = (length($(i+1))>=transientLength[i] ? length($(i+1)) : transientLength[i])
        }
    }
}

END {
    # print header
    for (j in headers) {
        spaceLength = transientLength[j]-length(headers[j])
        for (s=1;s<=spaceLength;s++) {
            spaces = spaces" "
        }
        if (!printable) printable = headers[j] spaces
        else printable = printable OFS headers[j] spaces
        spaces = ""     # garbage collection
    }
    printable = "| "printable" |"
    print printable
    printable = ""      # garbage collection
    # print alignments
    for (j in transientLength) {
        for (i=1;i<=transientLength[j];i++) {
            sep = sep"-"
        }
        if (!printable) printable = sep
        else printable = printable OFS sep
        sep = ""        # garbage collection
    }
    printable = "| "printable" |"
    print printable
    printable = ""      # garbage collection
    # print all rows
    for (f in fields) {
        for (j in fields[f]) {
            spaceLength = transientLength[j]-length(fields[f][j])
            for (s=1;s<=spaceLength;s++) {
                spaces = spaces" "
            }
            if (!printable) printable = fields[f][j] spaces
            else printable = printable OFS fields[f][j] spaces
            spaces = ""     # garbage collection
        }
        printable = "| "printable" |"
        print printable
        printable = ""      # garbage collection
    }

}

But please be aware: you need to clean your input file of unnecessary whitespaces. It should read:

header,word1,word2,word3
supercalifragi,black,white,red
adc,bad,cat,love

Alternatively, you might use FS=", ", but that would be actually limited to your example.

Sign up to request clarification or add additional context in comments.

1 Comment

Be sure to check this other answer as well, for more advanced users.
1

It's not exactly the output you asked for but maybe this is all you really need:

$ column -t -s, -o' | ' < file | awk '1; NR==1{gsub(/[^|]/,"-"); print}'
header         | word1 | word2 | word3
---------------|-------|-------|------
supercalifragi | black | white | red
adc            | bad   | cat   | love

2 Comments

I guess yes if you are not bothered to add pipes in a second step. I don't remember well how markdown table works: do they need a whitespace near the pipe in |---|?
Never occurred to me that this might have anything to do with markdown, the OP doesn't mention it. I thought they just wanted to see tabular text.
0

a shorter alternative with double scanning

$ awk -F' *, *' 'function pr() 
                 {for(i=1;i<=NF;i++) printf "| %-"len[i]+1"s",$i; printf "|\n"}

          NR==FNR{for(i=1;i<=NF;i++) 
                    if(len[i]<length($i)) {len[i]=length($i); word[i]=$i} next}

                 {pr()}

           FNR==1{for(i=1;i<=NF;i++) {gsub(/./,"-",word[i]); $i=word[i]}; pr()}'  file{,}

| header         | word1 | word2 | word3 |
| -------------- | ----- | ----- | ----- |
| supercalifragi | black | white | red   |
| adc            | bad   | cat   | love  |

4 Comments

the shorter alternative how would this be implemented using an awk script and not directly from the command line?
to make a script copy the contents between single quotes to a file and run with awk -f script.name ... If you have specific questions I can answer but you need to put some effort first.
the problem I encountered when running the file was with file{,}. what is this for. is this to designate which file i'm using?
A little bit less readable, but very nice if you want a shorter code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.