--pipe is inefficient (though not at the scale your are measuring - something is very wrong on your system). It can deliver in the order of 1 GB/s (total).
--pipepart is, on the contrary, highly efficient. It can deliver in the order of 1 GB/s per core, provided your disk is fast enough. This should be the most efficient ways of processing data.txt1. It will split data.txt1 in into one block per cpu core and feed those blocks into a wc -l running on each core:
parallel --block -1 --pipepart -a data.txt1 wc -l
You need version 20161222 or later for block -1 to work.
These are timings from my old dual core laptop. seq 200000000 generates 1.8 GB of data.
$ time seq 200000000 | LANG=C wc -c
1888888898
real 0m7.072s
user 0m3.612s
sys 0m2.444s
$ time seq 200000000 | parallel --pipe LANG=C wc -c | awk '{s+=$1} END {print s}'
1888888898
real 1m28.101s
user 0m25.892s
sys 0m40.672s
The time here is mostly due to GNU Parallel spawning a new wc -c for each 1 MB block. Increasing the block size makes it faster:
$ time seq 200000000 | parallel --block 10m --pipe LANG=C wc -c | awk '{s+=$1} END {print s}'
1888888898
real 0m26.269s
user 0m8.988s
sys 0m11.920s
$ time seq 200000000 | parallel --block 30m --pipe LANG=C wc -c | awk '{s+=$1} END {print s}'
1888888898
real 0m21.628s
user 0m7.636s
sys 0m9.516s
As mentioned --pipepart is much faster if you have data in a file:
$ seq 200000000 > data.txt1
$ time parallel --block -1 --pipepart -a data.txt1 LANG=C wc -c | awk '{s+=$1} END {print s}'
1888888898
real 0m2.242s
user 0m0.424s
sys 0m2.880s
So on my old laptop I can process 1.8 GB in 2.2 seconds.
If you have only one core and your work is CPU dependent, then parallelizing will not help you. Parallelizing on a single core machine can make sense if most of the time is spent waiting (e.g. waiting for the network).
However, the timings from your computer tells me something is very wrong with that. I will recommend you test your program on another computer.