Notes on AWS, Big Data, Machine Learning and Leadership: Presto

Monday, 12 March 2018

Presto

Developed at Facebook (Hive too slow for them)
Does not use map reduce
Does not store intermediate results on disk (in-memory only)
- Does not "spill"
Pipelined execution (all the stages at once)
- Example: it discovers a file and starts executing immediately, even though other files are not "touched yet", great for "LIMIT 10" queries
It can stream results
Implemented in Java
- Generates bytecode directly
- Manage memory themselves (avoid Garbage Collections)
  - Flat memory
Most of the time is spent reading, parsing and deserializing data to the internal memory format

AWS

References

Notes on AWS, Big Data, Machine Learning and Leadership