Monday, 12 March 2018

Presto

Presto

  • Developed at Facebook (Hive too slow for them)
  • Does not use map reduce
  • Does not store intermediate results on disk (in-memory only)
    • Does not "spill"
  • Pipelined execution (all the stages at once)
    • Example: it discovers a file and starts executing immediately, even though other files are not "touched yet", great for "LIMIT 10" queries
  • It can stream results
  • Implemented in Java
    • Generates bytecode directly
    • Manage memory themselves (avoid Garbage Collections)
      • Flat memory
  • Most of the time is spent reading, parsing and deserializing data to the internal memory format

AWS
  • Used by Athena
  • Can be run on EMR cluster (bypassing Hadoop)

References

No comments:

Post a Comment