How to convert CSV to Excel with adding header rows between different data using Shell script?

Question

I want to process CSV file line by line and if table_name is different, need to add header row.

Sample CSV:

table_name,no.,data 
attribute,column_name,definition,data_type,valid_values,notes
archive_rule,1,ID,id,,int,,
archive_rule,2,EXECUTE SEQ,execute_seq,,int,,
archive_rule,3,ARCHIVE RULE NAME,archive_rule_name,,varchar,,
archive_rule,4,ARCHIVE RULE TABLE NAME,archive_rule_table_name,,varchar,,
archive_rule,5,ARCHIVE RULE PK NAME,archive_rule_pk_name,,varchar,,
archive_rule,6,ARCHIVE BATCH SIZE,archive_batch_size,,int,,
archive_rule,7,ACTIVE STATUS,active_status,,varchar,,
archive_table,1,ID,id,,int,,
archive_table,2,ARCHIVE RULE ID,archive_rule_id,,int,,
archive_table,3,EXECUTE SEQ,execute_seq,,int,,
archive_table,4,ARCHIVE DEPEND TABLE ID,archive_depend_table_id,,int,,
archive_table,5,ARCHIVE DEPEND LEVEL,archive_depend_level,,int,,
archive_table,6,ACTIVE STATUS,active_status,,varchar,,
batch_job,1,BATCH JOB ID,batch_job_id,,int,,
batch_job,2,JOB TYPE,job_type,,varchar,,
batch_job,3,JOB NAME,job_name,,varchar,,
batch_job,4,EXECUTION DATE,execution_date,,timestamp,,
batch_job,5,EXECUTION RESULT,execution_result,,varchar,,
batch_job,6,ERROR MESSAGE,error_message,,varchar,,
batch_job,7,REPORT OUTPUT,report_output,,varchar,,

Desired Result:

Data : archive_rule
no.,data attribute,column_name,definition,data_type,valid_values,notes
1,ID,id,,int,,
2,EXECUTE SEQ,execute_seq,,int,,
3,ARCHIVE RULE NAME,archive_rule_name,,varchar,,
4,ARCHIVE RULE TABLE NAME,archive_rule_table_name,,varchar,,
5,ARCHIVE RULE PK NAME,archive_rule_pk_name,,varchar,,
6,ARCHIVE BATCH SIZE,archive_batch_size,,int,,
...
Data: archive_table
no.,data attribute,column_name,definition,data_type,valid_values,notes
1,ID,id,,int,,
2,ARCHIVE RULE ID,archive_rule_id,,int,,
3,EXECUTE SEQ,execute_seq,,int,,
4,ARCHIVE DEPEND TABLE ID,archive_depend_table_id,,int,,
5,ARCHIVE DEPEND LEVEL,archive_depend_level,,int,,
...

Please help me to find a way to get output.

Serge Ballesta · Accepted Answer · 2019-02-27 13:18:25Z

I can only imagine one way here: read the input file line by line, and use cut to extract the first field. This should do the trick:

#! /bin/bash

# accept both process.sh file and process.sh < file
if [ $# -eq 1 ]
then file="$1"
else file=-
fi

#initialize table name to the empty string
cur=""

# process the input line by line after skipping the header
cat "file" | tail +3 | (
while true
do
    read line
    if [ $? -ne 0 ]    # exit loop on end of file or error
    then
        break
    fi
    tab=$( echo $line | cut -f 1 -d, )   # extract table name
    if [ "x$tab" != "x$cur" ]
    then
        cur=$tab                     # if a new one remember it
        echo "Data: $tab"            # and write header
        echo "no.,data attribute,column_name,definition,data_type,valid_values,notes"
    fi
    echo $line | cut -f 2- -d,           # copy all except first field
done )

But I would use a true script language like Ruby or Python here...

James Brown · Accepted Answer · 2019-02-27 16:03:48Z

0

Using awk:

$ awk '
BEGIN { FS=OFS="," }                       # set field separators
NR==1 {                                    # first record, start building the header
    h=$2 OFS $3
    next
}
NR==2 {                                    # second record, continue header construct
    h=h $0                                 # space was in the end of record NR==1
    next
} 
$1!=p {                                    # when the table name changes
    print "Data : " $1                     # print table name
    print h                                # and header
}
{
    for(i=2;i<=NF;i++)                     # print fields 2->
        printf "%s%s",$i,(i==NF?ORS:OFS)   # field separator or newline
    p=$1                                   # remember the table name for next record
}' file

Output:

Data : archive_rule
no.,data attribute,column_name,definition,data_type,valid_values,notes
1,ID,id,,int,,
2,EXECUTE SEQ,execute_seq,,int,,
...
Data : archive_table
no.,data attribute,column_name,definition,data_type,valid_values,notes
1,ID,id,,int,,
2,ARCHIVE RULE ID,archive_rule_id,,int,,
...
Data : batch_job
no.,data attribute,column_name,definition,data_type,valid_values,notes
1,BATCH JOB ID,batch_job_id,,int,,
2,JOB TYPE,job_type,,varchar,,
...

edited Feb 27, 2019 at 16:03

answered Feb 27, 2019 at 13:42

James Brown

37.7k8 gold badges52 silver badges64 bronze badges

2 Comments

HSU WAI Over a year ago

Hello James, Thanks for your answer. But, I want this result in excel format with column by column. Is there any ways to get like this.

James Brown Over a year ago

Output is not the same, yes. I presented shortened version of the output for the given input. Is the output presented in the original post the exact desired output, ie. archive_rule,7... and archive_table,6... are replaced with ... etc.? In that case I have understood the requirement wrong and I'd like some more specific description of the output.

Collectives™ on Stack Overflow

How to convert CSV to Excel with adding header rows between different data using Shell script?

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related