Categories: All - schema - compression

by Mike Ton 3 years ago

430

ProtocolBuffers

Different data serialization formats offer unique advantages and drawbacks. JSON, widely accepted for its flexibility and ease of use, lacks schema enforcement, making validation difficult but is easily readable by any language.

ProtocolBuffers

ProtocolBuffers

Niantics

pgo
holoholo_shared.proto
game_master.proto

message

GameMasterClientTemplateProto { ... }

// Items client attributes shared from GameMasterTemplateProto. This is the client version of the game master tuning data for items.

Google

Tutorial
C#

Compiling your protocol buffers

Defining your protocol format

Protocol Buffer Basics: C#

Use the C# protocol buffer API to write and read messages.

Use the protocol buffer compiler.

Define message formats in a .proto file.

Definition
A bit of history

Protocol buffers were initially developed at Google to deal with an index server request/response protocol. Prior to protocol buffers, there was a format for requests and responses that used hand marshalling/unmarshalling of requests and responses, and that supported a number of versions of the protocol.

As the system evolved, it acquired a number of other features and uses:

Server RPC interfaces started to be declared as part of protocol files, with the protocol compiler generating stub classes that users could override with actual implementations of the server's interface.

In addition to being used for short-lived RPC (Remote Procedure Call) requests, people started to use protocol buffers as a handy self-describing format for storing data persistently (for example, in Bigtable).

Automatically-generated serialization and deserialization code avoided the need for hand parsing.

Protocol buffers were designed to solve many of these problems:

Formats were more self-describing, and could be dealt with from a variety of languages (C++, Java, etc.)

New fields could be easily introduced, and intermediate servers that didn't need to inspect the data could simply parse it and pass through the data without needing to know about all the fields.

Why not just use XML?

XML is better for

XML is human-readable and human-editable

A protocol buffer is only meaningful if you have the message definition (the .proto file).

text-based document with markup (e.g. HTML)

protos : you cannot easily interleave structure with text

Advantages over XML for serializing structured data.

generate data access classes that are easier to use programmatically

less ambiguous

20 to 100 times faster

3 to 10 times smaller

simpler

How do they work?

(read)

Once you've defined your messages, you run the protocol buffer compiler for your application's language on your .proto file to generate data access classes.

So, for instance, if your chosen language is C++, running the compiler on the above example will generate a class called Person

You can add new fields to your message formats without breaking backwards-compatibility; old binaries simply ignore the new field when parsing.

So if you have a communications protocol that uses protocol buffers as its data format, you can extend your protocol without having to worry about breaking existing code.

You can then use this class in your application to populate, serialize, and retrieve Person protocol buffer messages.

fstream input("myfile", ios::in | ios::binary); Person person; person.ParseFromIstream(&input); cout << "Name: " << person.name() << ends; cout << "E-mail: " << person.email() << ends;

Person person; person.set_name("John Doe"); person.set_id(1234); person.set_email("jdoe@example.com"); fstream output("myfile", ios::out | ios::binary); person.SerializeToOstream(&output);

These provide simple accessors for each field (like name() and set_name()) as well as methods to serialize/parse the whole structure to/from raw bytes.

(write)

Basic example of a .proto file that defines a message containing information about a person:

message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}
repeated PhoneNumber phone = 4;
}

repeated

???

optional

required

https://developers.google.com/protocol-buffers/docs/proto

// You can specify optional fields, required fields, and repeated fields.

Each protocol buffer message is a small logical record of information, containing a series of name-value pairs.

You specify how you want the information you're serializing to be structured by defining protocol buffer message types in .proto files.

What are protocol buffers?

Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

History
(evolution of data)

protobuf

Can't open in text editor because it's compressed and serialized

Code is generated automatically for you

3-10x smaller and 20-100x faster than XML

Schema can evolve over time in a safe manner

Data can be read across all programming languages

Documentation can be embedded in the schema

Schema is needed to generate code and read the data

defined by .proto text file

Compressed automatically

Human readable

JSON

No comments, metadata or documentation

JSON objects can be large in size because of repeated keys

Data has no schema enforcing

Easily shared over the network

Easily read by any language

Widely accepted across the web

Can take any forms (arrays, nested elements)

Relational Tables / SQL

Database has varying definition : working across different databases are difficult

Data has to be flat to fit in a table

Data fits in a table

Data is fully typed

csv

cons

Column names aren't strictly enforced : may or may not exist

Parsing is tricky for data that contains commas

Data Type has to be inferred : validation is an issue

pro

Easy to make sense of

Easy to parse

Easy to read

https://developers.google.com/protocol-buffers/