How should you determine partitioning strategy for a lakehouse table in Fabric?

Prepare for the DP-700 Microsoft Fabric Data Engineer Exam with flashcards and multiple choice questions. Study with hints and explanations, and ensure success on your certification exam!

Multiple Choice

How should you determine partitioning strategy for a lakehouse table in Fabric?

Explanation:
Partitioning a lakehouse table in Fabric should be guided by how you filter and retain data. The most effective approach is to partition by columns that are commonly used in queries—high-cardinality, time-based, or frequently filtered columns—so the query engine can prune partitions and read only the relevant data. Time-based partitions are especially useful for data retention and purge patterns because you can drop whole partitions to remove old data efficiently. High-cardinality partition keys help keep partitions reasonably sized and avoid a handful of massive partitions or an explosion of tiny ones, which would add metadata and file management overhead. The key is balancing partition count: enough partitions to enable pruning, but not so many that metadata and small-files costs outweigh the benefits. This approach also aligns with prune strategies so queries skip unnecessary data, improving performance. Other options don’t support the pruning principle as effectively. Not partitioning prevents partition pruning and leads to full scans on large lakehouse tables. Partitioning by alphabetical order or by hash alone typically doesn’t reflect common query predicates, so it doesn’t enable selective reads as well, and hashing alone can fragment data without aiding predicate pruning.

Partitioning a lakehouse table in Fabric should be guided by how you filter and retain data. The most effective approach is to partition by columns that are commonly used in queries—high-cardinality, time-based, or frequently filtered columns—so the query engine can prune partitions and read only the relevant data. Time-based partitions are especially useful for data retention and purge patterns because you can drop whole partitions to remove old data efficiently. High-cardinality partition keys help keep partitions reasonably sized and avoid a handful of massive partitions or an explosion of tiny ones, which would add metadata and file management overhead. The key is balancing partition count: enough partitions to enable pruning, but not so many that metadata and small-files costs outweigh the benefits. This approach also aligns with prune strategies so queries skip unnecessary data, improving performance.

Other options don’t support the pruning principle as effectively. Not partitioning prevents partition pruning and leads to full scans on large lakehouse tables. Partitioning by alphabetical order or by hash alone typically doesn’t reflect common query predicates, so it doesn’t enable selective reads as well, and hashing alone can fragment data without aiding predicate pruning.

Subscribe

Get the latest from Passetra

You can unsubscribe at any time. Read our privacy policy