Edit

Share via


Compare Fabric Data Engineering and Azure Synapse Spark

This article compares Azure Synapse Spark and Fabric Spark across Spark pools, configurations, libraries, notebooks, and Spark job definitions (SJD).

Category Azure Synapse Spark Fabric Spark
Spark pools Spark pool
-
-
Starter pool (pre-warmed) / Custom pool
V-Order
High concurrency
Spark configurations Pool level
Notebook or Spark job definition level
Environment level
Notebook or Spark job definition level
Spark libraries Workspace level packages
Pool level packages
Inline packages
-
Environment libraries
Inline libraries
Resources Notebook (Python, Scala, Spark SQL, R, .NET)
Spark job definition (Python, Scala, .NET)
Synapse pipelines
Pipeline activities (notebook, Spark job definition)
Notebook (Python, Scala, Spark SQL, R)
Spark job definition (Python, Scala, R)
Data Factory pipelines
Pipeline activities (notebook, Spark job definition)
Data Primary storage (ADLS Gen2)
Data residency (cluster/region based)
Primary storage (OneLake)
Data residency (capacity/region based)
Metadata Internal Hive Metastore (HMS)
External HMS (using Azure SQL DB)
Internal HMS (lakehouse)
-
Connections Connector type (linked services)
Data sources
Data source conn. with workspace identity
Connector type (Data Movement and Transformation Services)
Data sources
-
Security RBAC and access control
Storage ACLs (ADLS Gen2)
Private Links
Managed virtual network (VNet) for network isolation
Synapse workspace identity
Data Exfiltration Protection (DEP)
Service tags
Key Vault (via mssparkutils/ linked service)
RBAC and access control
OneLake RBAC
Private Links
Managed virtual network (VNet)
Workspace identity
-
Service tags
Key Vault (via notebookutils)
DevOps Azure DevOps integration
CI/CD (no built-in support)
Azure DevOps integration
Deployment pipelines
Developer experience IDE integration (IntelliJ)
Synapse Studio UI
Collaboration (workspaces)
Livy API
API/SDK
mssparkutils
IDE integration (VS Code)
Fabric UI
Collaboration (workspaces and sharing)
Livy API
API/SDK
notebookutils
Logging and monitoring Spark Advisor
Built-in monitoring pools and jobs (through Synapse Studio)
Spark history server
Prometheus/Grafana
Log Analytics
Storage Account
Event Hubs
Spark Advisor
Built-in monitoring pools and jobs (through Monitoring hub)
Spark history server
-
Log Analytics
Storage Account
Event Hubs
Business continuity and disaster recovery (BCDR) BCDR (data) ADLS Gen2 BCDR (data) OneLake

When to choose: Use Fabric Spark for unified analytics with OneLake storage, built-in CI/CD pipelines, and capacity-based scaling. Use Azure Synapse Spark when you need GPU-accelerated pools, external Hive Metastore, or JDBC connections.

Key limitations in Fabric

  • JDBC: Not supported
  • DMTS in notebooks: Data Movement and Transformation Services can't be used in notebooks or Spark job definitions
  • Managed identity for Key Vault: Not supported in notebooks
  • External Hive Metastore: Not supported
  • GPU-accelerated pools: Not available
  • .NET for Spark (C#): Not supported

More Fabric considerations

Spark pool comparison

The following table compares Azure Synapse Spark and Fabric Spark pools.

Spark setting Azure Synapse Spark Fabric Spark
Live pool (pre-warmed instances) - Yes, starter pool
Custom pool Yes Yes
Spark versions (runtime) 2.4, 3.1, 3.2, 3.3, 3.4 3.3, 3.4, 3.5
Autoscale Yes Yes
Dynamic allocation of executors Yes, up to 200 Yes, based on capacity
Adjustable node sizes Yes, 3-200 Yes, 1-based on capacity
Minimum node configuration 3 nodes 1 node
Node size family Memory Optimized, GPU accelerated Memory Optimized
Node size Small-XXXLarge Small-XXLarge
Autopause Yes, customizable minimum 5 minutes Yes, noncustomizable 2 minutes
High concurrency No Yes
V-Order No Yes
Spark autotune No Yes
Native Execution Engine No Yes
Concurrency limits Fixed Variable based on capacity
Multiple Spark pools Yes Yes (environments)
Intelligent cache Yes Yes
API/SDK support Yes Yes

When to choose: Use Fabric Spark pools for fast startup (starter pools), single-node jobs, high concurrency sessions, and V-Order optimization. Use Azure Synapse pools when you need GPU acceleration or fixed scaling up to 200 nodes.

Spark runtime versions

Fabric Spark supported versions:

Fabric doesn't support Spark 2.4, 3.1, or 3.2.

Adjustable node sizes

Azure Synapse Spark pools scale up to 200 nodes regardless of node size. In Fabric, the maximum number of nodes depends on node size and provisioned capacity (SKU).

Fabric capacity conversion: 2 Spark vCores = 1 capacity unit. Formula: total vCores in capacity รท vCores per node size = maximum nodes available.

For example, SKU F64 provides 64 capacity units (128 Spark vCores). The following table shows node limits for F64:

Spark pool size Azure Synapse Spark Fabric Spark (Custom Pool, SKU F64)
Small (4 vCores) Min: 3, Max: 200 Min: 1, Max: 32
Medium (8 vCores) Min: 3, Max: 200 Min: 1, Max: 16
Large (16 vCores) Min: 3, Max: 200 Min: 1, Max: 8
X-Large (32 vCores) Min: 3, Max: 200 Min: 1, Max: 4
XX-Large (64 vCores) Min: 3, Max: 200 Min: 1, Max: 2

For more information, see Spark compute.

Node sizes

Fabric Spark pools support only the Memory Optimized node family. GPU-accelerated pools available in Azure Synapse aren't supported in Fabric.

Node size comparison (XX-Large):

  • Azure Synapse: 432 GB memory
  • Fabric: 512 GB memory, 64 vCores

Node sizes Small through X-Large have identical vCores and memory in both Azure Synapse and Fabric.

Autopause behavior

Autopause settings comparison:

  • Azure Synapse: Configurable idle timeout, minimum 5 minutes
  • Fabric: Fixed 2-minute autopause after session expires (not configurable), default session timeout is 20 minutes

High concurrency

Fabric supports high concurrency mode for notebooks, allowing multiple users to share a single Spark session. Azure Synapse doesn't support this feature.

Concurrency limits

Azure Synapse Spark limits (fixed):

  • 50 concurrent jobs per pool, 200 queued jobs per pool
  • 250 active jobs per pool, 1,000 per workspace

Fabric Spark limits (SKU-based):

  • Concurrent jobs vary by capacity SKU: 1 to 512 max
  • Dynamic reserve-based throttling manages peak usage

For more information, see Concurrency limits and queueing in Microsoft Fabric Spark.

Multiple Spark pools

In Fabric, use environments to configure and select different Spark pools per notebook or Spark job definition.

Spark configurations comparison

Spark configurations apply at two levels:

  • Environment level: Default configuration for all Spark jobs in the environment
  • Inline level: Per-session configuration in notebooks or Spark job definitions
Spark configuration Azure Synapse Spark Fabric Spark
Environment level Yes, pools Yes, environments
Inline Yes Yes
Import/export Yes Yes (.yml from environments)
API/SDK support Yes Yes

When to choose: Both platforms support environment and inline configurations. Fabric uses environments instead of pool-level configs.

  • Inline syntax: In Fabric, use spark.conf.set(<conf_name>, <conf_value>) for session-level configs. For batch jobs, use SparkConf.
  • Immutable configs: Some Spark configurations can't be modified. Error message: AnalysisException: Can't modify the value of a Spark config: <config_name>
  • V-Order: Enabled by default in Fabric; write-time optimization for parquet files. See V-Order.
  • Optimized Write: Enabled by default in Fabric; disabled by default in Azure Synapse.

Spark libraries comparison

Spark libraries apply at three levels:

  • Workspace level: Available in Azure Synapse only
  • Environment level: Libraries available to all notebooks and Spark job definitions in the environment
  • Inline: Session-specific libraries installed at notebook startup
Spark library Azure Synapse Spark Fabric Spark
Workspace level Yes No
Environment level Yes, Pools Yes, environments
Inline Yes Yes
Import/export Yes Yes
API/SDK support Yes Yes

When to choose: Both platforms support environment and inline libraries. Fabric doesn't support workspace-level packages.

  • Built-in libraries: Fabric and Azure Synapse runtimes share a common Spark core but differ in library versions. Some code might require recompilation or custom libraries. See Fabric runtime libraries.

Notebook comparison

Notebooks and Spark job definitions are primary code items for developing Apache Spark jobs in Fabric. There are some differences between Azure Synapse Spark notebooks and Fabric Spark notebooks:

Notebook capability Azure Synapse Spark Fabric Spark
Import/export Yes Yes
Session configuration Yes. UI and inline Yes. UI (environment) and inline
IntelliSense Yes Yes
mssparkutils Yes Yes
Notebook resources No Yes
Collaborate No Yes
High concurrency No Yes
.NET for Spark C# Yes No
Pipeline activity support Yes Yes
Built-in scheduled run support No Yes
API/SDK support Yes Yes

When to choose: Use Fabric notebooks for collaboration, high concurrency sessions, built-in scheduling, and notebook resources. Use Azure Synapse notebooks if you require .NET for Spark (C#).

  • notebookutils.credentials: Only getToken and getSecret are supported in Fabric (DMTS connections not available).
  • Notebook resources: Fabric provides a Unix-like file system for managing files. See How to use notebooks.
  • High concurrency: Alternative to ThreadPoolExecutor in Azure Synapse. See Configure high concurrency mode.
  • .NET for Spark: Migrate C#/F# workloads to Python or Scala.
  • Linked services: Replace with Spark libraries for external data source connections.

Spark job definition comparison

Important Spark job definition considerations:

Spark job capability Azure Synapse Spark Fabric Spark
PySpark Yes Yes
Scala Yes Yes
.NET for Spark C# Yes No
SparkR No Yes
Import/export Yes (UI) No
Pipeline activity support Yes Yes
Built-in scheduled run support No Yes
Retry policies No Yes
API/SDK support Yes Yes

When to choose: Use Fabric Spark job definitions for SparkR support, built-in scheduling, and retry policies. Use Azure Synapse if you need .NET for Spark (C#) or UI-based import/export.

  • Supported files: .py, .R, and .jar files with reference files, command line arguments, and lakehouse references.
  • Import/export: UI-based JSON import/export available in Azure Synapse only.
  • Retry policies: Enable indefinite runs for Spark Structured Streaming jobs.
  • .NET for Spark: Migrate C#/F# workloads to Python or Scala.

Hive Metastore (HMS) comparison

HMS type Azure Synapse Spark Fabric Spark
Internal HMS Yes Yes (lakehouse)
External HMS Yes No

When to choose: Use Fabric if lakehouse-based internal HMS meets your needs. Use Azure Synapse if you require external Hive Metastore (Azure SQL DB) or Catalog API access.