When Microsoft Fabric was released, it came with Apache Spark out of the box. Spark’s ability to work with more programming languages opened up possibilities for creating data-driven and automated Lakehouses.
Although the Native Execution Engine in Fabric Spark allows Spark’s typical big-data oriented capabilities to be competitive for small data workloads as well, there can still be scenarios where Python is a better fit.
With Python Notebooks, we have the option to augment Spark with the best of open-source to deliver lightweight and performant data processing solutions.
We will cover:
- The difference between Python Notebooks and a Single Node Spark.
- When to use Python Notebooks and when to use Spark Notebooks.
- Where to use Python Notebooks in a meta-driven Lakehouse
- A brief introduction to tooling and moving workload between Python Notebooks and Spark Notebooks.
- How to avoid overload the Lakehouse tech stack with python technologies, with an introduction to Apache Arrow
- Costs
After this session, attendees will have an understanding of how to apply Python Notebooks, as well as Spark Notebooks, to get the most out of Fabric for data processing.
