Ceaseless Technology Updates: Parquet - Columnar Storage format for Hadoop

Friday, October 11, 2013

Parquet - Columnar Storage format for Hadoop

Based on "record shredding and assembly algorithm" defined in Google's Dremel Paper , "parquet" seems to be good choice for Efficient Data Storage. - http://parquet.io/

The Complete project is divided into 2 parts: -

1. Parquet Format - This Contains the thrift based definitions for the Storage Format.
2. Parquet-MR - Parquet MR contains M/R (Java) based implementation of the Parquet Format. It contains implementations for Hive, Avro, hadoop, Pig and Cascading.

The Best part is that all definitions are written in Thrift, so implementations can be in cross language.

Ceaseless Technology Updates

Friday, October 11, 2013

Parquet - Columnar Storage format for Hadoop

No comments:

About Me

Technology Updates

News on SOA, EAI, Web services

Google Search

Worth Reading

Application Performance Mangement

Ceaseless Technology Updates

Friday, October 11, 2013

Parquet - Columnar Storage format for Hadoop

No comments:

Subscribe To

About Me

Technology Updates

News on SOA, EAI, Web services

Google Search

Worth Reading

Application Performance Mangement