0

I want to process CSV file line by line and if table_name is different, need to add header row.

Sample CSV:

table_name,no.,data 
attribute,column_name,definition,data_type,valid_values,notes
archive_rule,1,ID,id,,int,,
archive_rule,2,EXECUTE SEQ,execute_seq,,int,,
archive_rule,3,ARCHIVE RULE NAME,archive_rule_name,,varchar,,
archive_rule,4,ARCHIVE RULE TABLE NAME,archive_rule_table_name,,varchar,,
archive_rule,5,ARCHIVE RULE PK NAME,archive_rule_pk_name,,varchar,,
archive_rule,6,ARCHIVE BATCH SIZE,archive_batch_size,,int,,
archive_rule,7,ACTIVE STATUS,active_status,,varchar,,
archive_table,1,ID,id,,int,,
archive_table,2,ARCHIVE RULE ID,archive_rule_id,,int,,
archive_table,3,EXECUTE SEQ,execute_seq,,int,,
archive_table,4,ARCHIVE DEPEND TABLE ID,archive_depend_table_id,,int,,
archive_table,5,ARCHIVE DEPEND LEVEL,archive_depend_level,,int,,
archive_table,6,ACTIVE STATUS,active_status,,varchar,,
batch_job,1,BATCH JOB ID,batch_job_id,,int,,
batch_job,2,JOB TYPE,job_type,,varchar,,
batch_job,3,JOB NAME,job_name,,varchar,,
batch_job,4,EXECUTION DATE,execution_date,,timestamp,,
batch_job,5,EXECUTION RESULT,execution_result,,varchar,,
batch_job,6,ERROR MESSAGE,error_message,,varchar,,
batch_job,7,REPORT OUTPUT,report_output,,varchar,,

Desired Result:

Data : archive_rule
no.,data attribute,column_name,definition,data_type,valid_values,notes
1,ID,id,,int,,
2,EXECUTE SEQ,execute_seq,,int,,
3,ARCHIVE RULE NAME,archive_rule_name,,varchar,,
4,ARCHIVE RULE TABLE NAME,archive_rule_table_name,,varchar,,
5,ARCHIVE RULE PK NAME,archive_rule_pk_name,,varchar,,
6,ARCHIVE BATCH SIZE,archive_batch_size,,int,,
...
Data: archive_table
no.,data attribute,column_name,definition,data_type,valid_values,notes
1,ID,id,,int,,
2,ARCHIVE RULE ID,archive_rule_id,,int,,
3,EXECUTE SEQ,execute_seq,,int,,
4,ARCHIVE DEPEND TABLE ID,archive_depend_table_id,,int,,
5,ARCHIVE DEPEND LEVEL,archive_depend_level,,int,,
...

Please help me to find a way to get output.

0

2 Answers 2

3

I can only imagine one way here: read the input file line by line, and use cut to extract the first field. This should do the trick:

#! /bin/bash

# accept both process.sh file and process.sh < file
if [ $# -eq 1 ]
then file="$1"
else file=-
fi

#initialize table name to the empty string
cur=""

# process the input line by line after skipping the header
cat "file" | tail +3 | (
while true
do
    read line
    if [ $? -ne 0 ]    # exit loop on end of file or error
    then
        break
    fi
    tab=$( echo $line | cut -f 1 -d, )   # extract table name
    if [ "x$tab" != "x$cur" ]
    then
        cur=$tab                     # if a new one remember it
        echo "Data: $tab"            # and write header
        echo "no.,data attribute,column_name,definition,data_type,valid_values,notes"
    fi
    echo $line | cut -f 2- -d,           # copy all except first field
done )

But I would use a true script language like Ruby or Python here...

Sign up to request clarification or add additional context in comments.

Comments

0

Using awk:

$ awk '
BEGIN { FS=OFS="," }                       # set field separators
NR==1 {                                    # first record, start building the header
    h=$2 OFS $3
    next
}
NR==2 {                                    # second record, continue header construct
    h=h $0                                 # space was in the end of record NR==1
    next
} 
$1!=p {                                    # when the table name changes
    print "Data : " $1                     # print table name
    print h                                # and header
}
{
    for(i=2;i<=NF;i++)                     # print fields 2->
        printf "%s%s",$i,(i==NF?ORS:OFS)   # field separator or newline
    p=$1                                   # remember the table name for next record
}' file

Output:

Data : archive_rule
no.,data attribute,column_name,definition,data_type,valid_values,notes
1,ID,id,,int,,
2,EXECUTE SEQ,execute_seq,,int,,
...
Data : archive_table
no.,data attribute,column_name,definition,data_type,valid_values,notes
1,ID,id,,int,,
2,ARCHIVE RULE ID,archive_rule_id,,int,,
...
Data : batch_job
no.,data attribute,column_name,definition,data_type,valid_values,notes
1,BATCH JOB ID,batch_job_id,,int,,
2,JOB TYPE,job_type,,varchar,,
...

2 Comments

Hello James, Thanks for your answer. But, I want this result in excel format with column by column. Is there any ways to get like this.
Output is not the same, yes. I presented shortened version of the output for the given input. Is the output presented in the original post the exact desired output, ie. archive_rule,7... and archive_table,6... are replaced with ... etc.? In that case I have understood the requirement wrong and I'd like some more specific description of the output.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.