0

I'm writing a little server that uses protocol buffer to encode some data.

  1. TCP Socket is opened between Android Client and Python Server

  2. Android Client sends string for processing as normal newline delimited utf-8.

  3. Python Server does some processing to generate a response, which gives an Array of Int Arrays: [[int]]. This is encoded in the protocol buffer file:

syntax = "proto2";

package tts;

message SentenceContainer {
    repeated Sentence sentence = 1;
}

message Sentence {
    repeated uint32 phonemeSymbol = 1;
}

It gets loaded into this structure and sent as follows...

container = ttsSentences_pb2.SentenceContainer()
for sentence in input_sentences:
    phonemes = container.sentence.add()
    # Add all the phonemes to the phoneme list
    phonemes.phonemeSymbol.extend(processor.text_to_sequence(sentence))


payload = container.SerializeToString()
client.send(payload)
  1. Android Client receives Protocol Buffer encoded message and tries to decode.

This is where I'm stuck...

# I get the InputStream when the TCP connection is first opened
bufferIn = socket.getInputStream();
TtsSentences.SentenceContainer sentences = TtsSentences.SentenceContainer.parseDelimitedFrom(bufferIn);

When receiving the message the client gets this exception:

E/TCP: Server Error
    com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.
        at com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:164)
        at com.google.protobuf.GeneratedMessageLite.parsePartialDelimitedFrom(GeneratedMessageLite.java:1527)
        at com.google.protobuf.GeneratedMessageLite.parseDelimitedFrom(GeneratedMessageLite.java:1496)
        at com.tensorspeech.tensorflowtts.TtsSentences$SentenceContainer.parseDelimitedFrom(TtsSentences.java:221)
        at com.tensorspeech.tensorflowtts.network.PersistentTcpClient.run(PersistentTcpClient.java:100)
        at com.tensorspeech.tensorflowtts.MainActivity.lambda$onCreate$0$MainActivity(MainActivity.java:71)
        at com.tensorspeech.tensorflowtts.-$$Lambda$MainActivity$NTUE8bAusaoF3UGkWb7-Jt806BY.run(Unknown Source:2)
        at java.lang.Thread.run(Thread.java:919)

I already know this problem is caused because Protocol buffer is not self delimiting, but I'm not sure how I'm supposed to properly delimit it. I've tried adding a newline client.send(payload + b'\n'), and adding in the PB size in bytes to the beginning of the payload client.send(container.ByteSize().to_bytes(2, 'little') + payload), but am not sure how to proceed.

It's a shame there's no documentation on how to use Protocol Buffer over TCP Sockets in Java...

9
  • 1
    It looks like you're only sending one message. Why use parseDelimitedFrom instead of parseFrom? But more to the point, the issue looks like how to delimit from Python instead of how to parse from Java. Commented Nov 3, 2020 at 20:25
  • Thanks for that. I think the problem is on both sides, first I need to know how to delimit from Python, but then also what buffer/input types I need on Java to then properly parse that delimiter. Commented Nov 3, 2020 at 21:30
  • Are you only sending one message? If so, then you don't have to delimit at all. Commented Nov 3, 2020 at 21:51
  • Yeah, but for some reason, it just blocks on parseFrom(bufferIn) indefinitely. Commented Nov 3, 2020 at 21:52
  • Does the connection get closed after the one message is sent? Commented Nov 3, 2020 at 21:53

1 Answer 1

1

OK, I worked this out...

In the case where you have a short-lived connection, the socket closing would signify the end of the payload, so no extra logic is required.

In my case, I have a long-lived connection, so closing the socket to signify the end of the payload wouldn't work.

With a Java Client & Server, you could get around this by using:

MessageLite.writeDelimitedTo(OutputStream)

then on the recipient side:

MessageLite.parseDelimitedFrom(InputStream).

Easy enough...

But in the Python API, there is no writeDelimitedTo() function. So instead we must recreate what writeDelimitedTo() is doing. Fortunately, it's simple. It simply adds a _VarintBytes equal to the payload size to the beginning of the message!

client, _ = socket.accept()
payload = your_PB_item.SerializeToString()
size = payload.ByteSize()
client.send(_VarintBytes(size) + payload)

Then on the Java recipient side...

bufferIn = socket.getInputStream();
yourPbItem message;

if ((message = yourPbItem.parseDelimitedFrom(bufferIn)) != null) {
    // Do stuff :)
}

This way, your protocol buffer library knows exactly how many bytes to read, and then to stop caring about the InputStream, rather than sitting listening indefinitely.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.