Working with big data can be a challenge, thanks to the performance overhead associated with moving data between different tools and systems as part of the data processing pipeline. Indeed, because programming languages, file formats and network protocols have different ways of representing the same data in memory, the process of serializing and deserializing data into a different representation at potentially each step in a data pipeline makes working with large amounts of data slower and more costly in terms of hardware.