Every department I engage with encounters the same underlying issue: data is plentiful, yet accessing the right information at the right moment—with sufficient confidence—is a persistent challenge. Analysts end up recreating datasets, dashboards proliferate uncontrollably, and engineers devote more effort to maintaining duplicates than extracting meaningful insights. Ironically, much of this data resides within the same cloud environment—yet it feels as inaccessible as if it were located continents apart.
The Data Discovery Problem
Microsoft Fabric’s OneLake was designed to solve this at the platform level. But owning a unified lake is only half the story. The real capability lies in how you discover what lives in OneLake and how you connect to it — without duplicating, without re-engineering, and without losing governance.
This blog post is a practitioner’s deep dive into the full discovery-to-connection lifecycle in OneLake. We will cover:
What OneLake is and why it matters as an organisational foundation
• The OneLake Catalog—your single pane of glass for data discovery
• Endorsement and Discoverability — building a culture of trusted data
• Shortcuts — connecting to data across clouds without duplication
• Mirroring — zero-ETL replication into OneLake
• Direct Lake — querying OneLake natively from Power BI
• Governance best practices that make all of the above sustainable
OneLake—One Lake for the Entire Organisation
Before we can discuss discovery, we need a clear mental model of what OneLake actually is.
OneLake is the single, unified data lake that underpins every Microsoft Fabric workload. Every Fabric tenant gets exactly one OneLake instance. Every Fabric item—Lakehouses, Warehouses, Eventhouses, Semantic Models, Dataflows—automatically stores its data inside OneLake in Delta Parquet format. There is no separate storage account to provision, no Azure Data Lake Storage Gen2 container to wire up manually. It is simply there.
The architectural metaphor that best captures OneLake is a corporate building. The building exists once. Each floor belongs to a department (a fabric domain). Each room on a floor is a team’s workspace. Each desk in that room holds a data item. Anyone with the right access badge can walk in and find what they need — without needing a separate building.
The OneLake Namespace
One of the most practical benefits of OneLake is its consistent addressing scheme. Every piece of data in OneLake is addressable via a URI that follows this pattern
OneLake URI Format:
https://onelake.dfs.fabric.microsoft.com/{workspace}/{item}.{itemType}/{path}
Example — Lakehouse table:
https://onelake.dfs.fabric.microsoft.com/SalesAnalytics/SalesLH.Lakehouse/Tables/FactSales
This means that any tool compatible with ADLS Gen2—Azure Storage Explorer, Azure Databricks, Apache Spark, or Power BI—can connect to OneLake data using the same APIs and SDKs they already use, simply by substituting the OneLake URI.
What Lives in OneLake
| Fabric Item Type | What It Stores in OneLake |
| Lakehouse | Delta Parquet tables (managed) + unstructured files |
| Data Warehouse | Delta Parquet tables—queryable via T-SQL |
| Eventhouse / KQL Database | Event data in columnar format |
| Semantic Model (Direct Lake) | References Delta tables — no separate import |
| Dataflow Gen2 outputs | Delta tables in a staging lakehouse |
| Mirrored databases | Near-real-time replicated Delta Parquet tables |
The OneLake Catalog — Your Central Discovery Hub
The OneLake Catalog is the evolved replacement for what was previously called the OneLake Data Hub. It is the single, searchable interface for all discoverable Fabric items—the storefront for your organization’s data assets.
What makes the catalog more than a simple list is how it organizes data into context. Users can scope the catalog to a specific domain and subdomain—for example, Finance > EMEA—and then filter further by item type, endorsement status, workspace, or last refresh date. The result is not a flat directory but a navigable, governed data marketplace
How to Access the Catalog
The OneLake Catalog is accessible from multiple surfaces, which is deliberate—data discovery should happen in the context where work is being done, not as a separate detour.
Microsoft Fabric portal — the primary experience with full filtering and governance
Microsoft Teams — embedded catalog so analysts never leave their collaboration tool
Microsoft Excel — discover and connect to certified datasets from within a workbook
Power BI Desktop—connect to lakehouses and warehouses without leaving the modelling experience
OneLake Shortcuts — Connect Without Copying
Shortcuts are one of the most architecturally important features in OneLake—and one of the most underutilized. A shortcut is a pointer: a metadata reference that makes data stored elsewhere appear as if it lives natively inside your OneLake lakehouse. No data moves. No copy is created. The shortcut simply says, ‘Look over there.’
Where Shortcuts Can Point
| Shortcut Target | Use Case | Authentication |
| Another OneLake location | Cross-workspace data sharing without duplication | User identity (passthrough) |
| Azure Data Lake Storage Gen2 | Legacy Azure data or external team storage | Account key / Service Principal / Trusted Workspace Access |
| Amazon S3 | Multi-cloud scenarios — data already in AWS | IAM access key + secret |
| Amazon S3-Compatible sources | MinIO, Cloudflare R2, and other S3-API services | IAM access key + secret |
| Google Cloud Storage | Data in GCP environments | HMAC key |
| Microsoft Dataverse | Business application data from Dynamics / Power Apps | Entra ID |
| On-premises (via OPDG) | Files or ADLS behind corporate firewall | On-premises data gateway |
Mirroring — Zero-ETL Replication into OneLake
Shortcuts answer the question, ‘How do I use data that lives outside OneLake without copying it?’ Mirroring answers a different question: ‘how do I keep a near-real-time, governed copy of an operational database inside OneLake—without building a pipeline?’
Mirroring is a no-ETL, continuous replication feature. It monitors a source database for changes and replicates those changes into OneLake as Delta Parquet tables, typically within seconds to minutes. Once mirrored, the data is a full first-class OneLake citizen — queryable via SQL, Spark, and Power BI Direct Lake.
Supported Mirroring Sources
| Source | GA / Preview Status |
| Azure SQL Database | Generally Available |
| Azure SQL Managed Instance | Generally Available |
| Azure Cosmos DB | Generally Available |
| Azure PostgreSQL Flexible Server | Generally Available |
| SQL Server 2016 – 2022 and 2025 | Generally Available |
| Azure Databricks Unity Catalog | Generally Available |
| Snowflake | Generally Available |
| Oracle Database | Preview |
| Microsoft Dataverse | Preview |
| Open Mirroring (custom sources) | Preview |
Mirroring vs. Shortcuts—When to Use Which
| Consideration | Shortcuts vs Mirroring |
| Data movement | Shortcuts: No movement. Mirroring: Data replicated into OneLake |
| Best for | Shortcuts: ADLS/S3/GCS file data. Mirroring: Relational databases |
| Latency | Shortcuts: Real-time (reads source directly). Mirroring: Near-real-time |
| Transformation | Shortcuts: None at connection. Mirroring: None (raw replication) |
| SQL analytics | Shortcuts: Supported for Delta tables. Mirroring: Always supported |
| Source stays live | Shortcuts: Yes, always. Mirroring: Yes, source is unaffected |
Direct Lake—Query OneLake Natively from Power BI
Once your data is in OneLake—whether natively, via shortcut, or via mirroring—the question becomes: how do Power BI reports consume it without the overhead of a scheduled import?
Direct Lake is the answer. It is a Power BI storage mode that reads Delta Parquet files from OneLake directly into the analysis engine at query time—without a scheduled refresh, without a separate imported copy, and without the latency of a DirectQuery live connection to a SQL endpoint
The Three Storage Modes Compared
| Mode | How It Works | Best For |
| Import | Full data copy loaded into in-memory column store on refresh | Static or slowly changing data with < 1GB per table |
| DirectQuery | Every visual issues a query to the source at render time | Very large data with low dashboard concurrency |
| Direct Lake | Delta files loaded on-demand from OneLake; cached in memory | Large, fast-changing data—the Fabric-native approach |
Direct Lake combines the performance of Import (in-memory column store) with the freshness of DirectQuery (no scheduled refresh needed). When the Delta table in OneLake is updated, the semantic model detects the change and reloads only the affected column segments—a process called “transcoding”—which typically completes in seconds.
Creating a Direct Lake Semantic Model
As of March 2025, Direct Lake semantic models can be authored in Power BI Desktop — not just in the Fabric portal. Key steps:
- In Power BI Desktop, select Get Data > Microsoft OneLake
- Authenticate with your Entra ID credentials
- Browse to your workspace and select the Lakehouse or Warehouse
- Select the tables you want to include—these can span multiple OneLake sources using shortcuts
- Power BI Desktop creates a Direct Lake connection—no import, no DirectQuery polling
- Build relationships and DAX measures, and publish to Fabric
Example Scenarios with OneLake in detailed Explaination:
Create Your Fabric Workspace
Everything in Fabric lives inside a workspace. In a real enterprise, a workspace maps to a team, a project, or a data domain. For this walkthrough we create a workspace to represent the team that owns the raw sales data
1 Navigate to the Fabric portal
Go to https://app.fabric.microsoft.com/home and sign in with your Fabric credentials.
2 Open Workspaces
In the left navigation bar, select Workspaces (the grid icon). Select + New workspace.
3 Name the workspace
Give it a meaningful name such as M365Demo_Blogs. In the Advanced section, select the Fabric or Fabric trial licence mode. Select Apply.
4 Verify the workspace
When the workspace opens, it should show an canvas ready for your Fabric item.
Create a Lakehouse and Load the Sales Data
A Lakehouse is a Fabric item that combines the flexibility of a data lake (any file type, any structure) with the governance of a data warehouse (Delta tables, schema enforcement, SQL access). It is the most natural home for raw and processed data in Fabric.
1 Create the Lakehouse
In workspace, select + New item > Lakehouse. Name it salesLH. After a moment, the lakehouse opens with empty Tables and Files folders.
2 Download the sales dataset
Open a new browser tab and navigate to: https://raw.githubusercontent.com/rajendra1918/Datasets/refs/heads/main/sales.csv Right-click anywhere on the page and select Save as to save it as sales.csv on your local machine.
3 Upload the file
In the Lakehouse explorer, highlight the Files folder. Select the ellipsis (…) menu, then Upload > Upload files. Select your sales.csv file and confirm the upload.
4 Preview the raw file
Select the Files folder to verify sales.csv uploaded. Select the file to preview its contents. You will see the raw CSV structure.
Load the CSV into a Delta Table
A raw CSV file in the Files folder is not yet queryable via SQL, and it does not benefit from Delta Lake features like ACID transactions, schema enforcement, or time travel. Loading it into a Delta table elevates the data into a governed, performant, queryable asset.
1 Trigger Load to Tables
In the ellipsis (…) menu for sales.csv, select Load to Files> sales.
2 Set the table name
In the Load to table dialog, set the table name to sales. Confirm the load operation and wait for the table to be created.
3 Verify the table
In the Explorer pane, select the sales table to view its data preview and schema. If the table does not appear automatically, select Refresh in the Tables folder menu.
Understand What Was Created
When you loaded the CSV, Fabric converted it into Delta Parquet format and stored it in OneLake. Here is what now exists behind the scenes
| Component | What It Is |
| Parquet files | The actual data, stored as columnar Parquet files in the Tables/sales/ folder in OneLake |
| _delta_log/ folder | Transaction log tracking every insert, update, and delete — enables ACID and time travel |
| SQL analytics endpoint | Auto-generated read-only SQL interface over the table — no setup required |
| OneLake URI | https://onelake.dfs.fabric.microsoft.com/Data-Engineering/sales-data.Lakehouse/Tables/sales |
Discovering Your Data Asset in the OneLake Catalog
Now that data exists in OneLake, let us explore the discovery experience — the journey a data consumer takes to find this asset. In a real organisation, the sales Lakehouse was created . A business analyst on a different team now needs to find and use this data. The OneLake Catalog is where that journey begins.
Opening the OneLake Catalog
1 Return to the Fabric home page
Select the Fabric icon (top left or at the top of the page) to navigate back to https://app.fabric.microsoft.com/home
2 Open the Catalog
In the left navigation pane, select the OneLake catalog icon (it looks like a data grid or table symbol). The catalog opens showing all Fabric items you have access to.
The catalog presents every Fabric item you have permission to see — regardless of which workspace it belongs to. For each item you can see
| Metadata field | What it tells you |
| Item name & type | The name and icon indicating whether it is a Lakehouse, Warehouse, Semantic Model, Report, etc. |
| Workspace | Which workspace (and therefore which team/domain) owns this item |
| Owner | The Entra ID user who created or is responsible for the item |
| Last updated | When the data was last refreshed or modified |
| Endorsement badge | None / Promoted / Certified — signals the trustworthiness of the item |
| SQL connection string | The connection string for tools like SSMS, Azure Data Studio, or Tableau |
| Sensitivity label | Confidential, General, Public — from Microsoft Purview |
Finding Sales Lakehouse
Use the catalog search and filters to find the lakehouse:
1 Search by name
In the catalog search bar, type sales. Your sales-data lakehouse should appear in the results.
2 Filter by item type
Use the item type filter and select Lakehouse to narrow the results to only lakehouses.
3 Select your Lakehouse
Select sales-data to open the detail pane. Review the metadata — location, owner, SQL connection string, and data updated timestamp.
4 Open the Lakehouse
Select Open to navigate directly to the Lakehouse explorer view from within the catalog — no need to manually navigate to the workspace.
Create the Analytics Workspace and Lakehouse
To demonstrate cross-workspace shortcuts, we need a second workspace representing the analytics team.
1 Create Analytics workspace
Return to your workspace list and create a second workspace. Name it Analytics (or any name representing a consumer team).
2 Create a new Lakehouse
Inside the Analytics workspace, select + New item > Lakehouse. Name it analytics. This lakehouse represents the analytics team’s working environment — separate from where the raw data lives.
1 Open the shortcut dialog
In the analytics Lakehouse explorer, select the ellipsis (…) menu on the Tables folder and select New shortcut.
2 Select OneLake as the source
In the New shortcut dialog, choose OneLake as the shortcut type. This means you are pointing to data inside your own Fabric tenant — not an external cloud.
3 Navigate to the source data
In the workspace list, select workspace. Then select the saleLH, expand the Tables folder, and select the sales table.
4 Review and create
On the confirmation screen, review the shortcut details and select Create.
| Shortcut Target | When to Use | Authentication |
| OneLake (same tenant) | Cross-workspace data sharing without duplication | User identity passthrough |
| ADLS Gen2 | Legacy Azure storage or external team data | Service principal / Trusted Workspace Access |
| Amazon S3 | Data already managed in AWS | IAM access key + secret |
| Google Cloud Storage | Multi-cloud scenarios with GCP | HMAC key |
| Dataverse | Business data from Dynamics 365 / Power Apps | Entra ID |
| On-premises via OPDG | Files behind corporate firewall | On-premises data gateway |
Querying OneLake Data with the SQL Analytics Endpoint
Every Lakehouse in Fabric includes an automatically provisioned SQL analytics endpoint. This is a read-only T-SQL interface over all Delta tables in the lakehouse — including tables accessed via shortcuts. No setup is needed, no connection string to configure manually, and no separate SQL pool to provision. It is simply there the moment your first Delta table exists.
The SQL analytics endpoint makes lakehouse data accessible to SQL-proficient analysts, BI tools like Power BI and Tableau, and external tools like SQL Server Management Studio and Azure Data Studio — all without moving or transforming the data.
Switch to the SQL Analytics Endpoint
1 Open the analytics Lakehouse
Navigate to the analytics Lakehouse in your Analytics workspace.
2 Switch view mode
In the top-right drop-down (currently showing Lakehouse), select SQL analytics endpoint. The view transitions to a SQL-focused interface showing your tables and a query editor.
Query 1 — Testing with sales Table:
Let us write our first analytical query. This calculates total revenue and total quantity sold for each item, ordered by revenue descending — a common starting point for sales performance analysis.
1 Open a new query
In the toolbar, select New SQL query to open the query editor.
2 Paste and run the query
Enter the following T-SQL and select Run:
SELECT Item,
SUM(Quantity * UnitPrice) AS TotalRevenue,
SUM(Quantity) AS TotalQuantity
from [SalesAnalytics].[dbo].[sales]
GROUPBY Item
ORDERBY TotalRevenue DESC
Query 2:
A second useful lens is customer-level analysis. This query identifies the five highest-value customers — useful for account management and targeted marketing decisions.
SELECTTOP5
CustomerName,
SUM(Quantity) AS TotalQuantity,
SUM(Quantity * UnitPrice) AS TotalRevenue
from [SalesAnalytics].[dbo].[sales]
GROUPBY CustomerName
ORDERBY TotalRevenue DESC
Query 3:Time-series analysis is essential for spotting growth patterns and seasonal variation. This query aggregates revenue by month
SELECT
FORMAT(CAST(OrderDate ASDATE), ‘yyyy-MM’) AS OrderMonth,
COUNT(DISTINCT SalesOrderNumber) AS OrderCount,
SUM(Quantity * UnitPrice) AS MonthlyRevenue
FROM [SalesAnalytics].[dbo].[sales]
GROUPBYFORMAT(CAST(OrderDate ASDATE), ‘yyyy-MM’)
ORDERBY OrderMonth ASC;
Good to know:
The SQL analytics endpoint is read-only by design. It cannot be used to INSERT, UPDATE, or DELETE data. All writes must go through Spark notebooks, Dataflows, or Pipelines. This separation of concerns ensures that analytical queries never accidentally modify the source data.
Creating a Semantic Model and Exploring the Data
The SQL analytics endpoint is powerful for ad-hoc querying, but it requires SQL knowledge and a query editor. Business users — account managers, finance leads, sales directors — need something more accessible. This is where the Semantic Model comes in.
A Semantic Model is a business-friendly layer on top of your Delta tables. It defines relationships between tables, creates pre-built DAX measures, applies friendly column names, and enables self-service reporting in Power BI. Critically, when built on top of an OneLake Lakehouse, the semantic model uses Direct Lake mode — meaning it reads Delta files directly from OneLake at query time, with no scheduled import and no data duplication.
Create the Semantic Model
1 Stay in SQL analytics endpoint view
Remain on the SQL analytics endpoint view of your analytics Lakehouse.
2 Select New semantic model
In the toolbar, select New semantic model.
3 Configure the model
In the Create a semantic model dialog, verify that the sales table is selected (it should be ticked by default). Give the model a name such as Sales Analysis. Select Create.
4 Locate the model in the workspace
After creation, navigate back to your Analytics workspace. You will see a new item with the semantic model icon — Sales Analysis.
Explore the Data Visually
The Explore this data feature is a lightweight, browser-based visual exploration tool. It lets business users drag and drop fields to create visualisations without needing Power BI Desktop or any local installation.
1 Open Explore this data
In your Analytics workspace, find the SalesSM. Select the ellipsis (…) menu and choose Explore this data.
2 Create your first visual
In the Explore window, drag the Item field from the data pane onto the canvas. This creates a table visual showing all items.
3 Add a measure
Drag Quantity to the Values area. The visual now shows total quantity per item.
Suggested Explorations
After building the initial visual, try the following to demonstrate the full range of the Explore experience:
| Exploration | Fields to Use |
| Revenue by item | Item on axis, SUM(Quantity * UnitPrice) as value — use a calculated column or measure |
| Sales over time | OrderDate on axis (grouped by month), Quantity as value — line chart |
| Top customers | CustomerName on axis, Quantity as value — sorted descending, top N filter |
| Revenue breakdown | Item as category — pie or donut chart |
Direct Lake advantage:
Because the semantic model uses Direct Lake mode, the visualisations always reflect the current state of the Delta table in OneLake. No scheduled refresh, no stale data, no import window to wait for. When the sales table is updated with new orders tonight, tomorrow’s report already includes them.
Summary — The Full Picture
At this point in our walkthrough, we have working data, a shortcut, SQL queries, and a semantic model. But there is a governance gap: nobody outside our team knows this work exists, and nobody can signal to colleagues that this is the authoritative sales dataset. Endorsement and Discoverability close this gap.
Let us bring the full journey together in one view. When we talk about discovering and connecting to data in OneLake, we are describing a layered architecture where each capability builds on the one below it.
| Layer | Capability | What It Eliminates |
| Storage | OneLake — single, unified Delta lake | Siloed, multi-account storage sprawl |
| Discovery | OneLake Catalog — searchable data marketplace | Hunting across workspaces and Teams messages |
| Trust | Endorsement + Discoverability | Competing ‘versions of the truth’ |
| Connection | Shortcuts — zero-copy pointers to external data | ETL pipelines just to make data accessible |
| Replication | Mirroring — near-real-time DB sync into OneLake | Complex CDC pipelines from operational systems |
| Analytics | Direct Lake — in-memory Power BI on OneLake data | Stale import refreshes and DirectQuery slowness |
| Governance | Catalog Govern tab + Purview labels + lineage | Shadow IT and ungoverned data sprawl |
The biggest shift OneLake brings is not technical — it is cultural. When data engineers publish once and business users discover without raising a ticket, and analysts connect without copying, the organisation stops functioning as a collection of data silos and starts behaving like a single intelligent data platform.
That is the promise of OneLake — and as of 2026, it is fully available for production use.
Thanks for Reading!
