Your Data is Already There — You Just Can’t Find It Yet: OneLake Discovery Unpacked

Every department I engage with encounters the same underlying issue: data is plentiful, yet accessing the right information at the right moment—with sufficient confidence—is a persistent challenge. Analysts end up recreating datasets, dashboards proliferate uncontrollably, and engineers devote more effort to maintaining duplicates than extracting meaningful insights. Ironically, much of this data resides within the same cloud environment—yet it feels as inaccessible as if it were located continents apart.

The Data Discovery Problem

Microsoft Fabric’s OneLake was designed to solve this at the platform level. But owning a unified lake is only half the story. The real capability lies in how you discover what lives in OneLake and how you connect to it — without duplicating, without re-engineering, and without losing governance.

This blog post is a practitioner’s deep dive into the full discovery-to-connection lifecycle in OneLake. We will cover:

What OneLake is and why it matters as an organisational foundation

• The OneLake Catalog—your single pane of glass for data discovery
• Endorsement and Discoverability — building a culture of trusted data
• Shortcuts — connecting to data across clouds without duplication
• Mirroring — zero-ETL replication into OneLake
• Direct Lake — querying OneLake natively from Power BI
• Governance best practices that make all of the above sustainable

OneLake—One Lake for the Entire Organisation

Before we can discuss discovery, we need a clear mental model of what OneLake actually is.

OneLake is the single, unified data lake that underpins every Microsoft Fabric workload. Every Fabric tenant gets exactly one OneLake instance. Every Fabric item—Lakehouses, Warehouses, Eventhouses, Semantic Models, Dataflows—automatically stores its data inside OneLake in Delta Parquet format. There is no separate storage account to provision, no Azure Data Lake Storage Gen2 container to wire up manually. It is simply there.

The architectural metaphor that best captures OneLake is a corporate building. The building exists once. Each floor belongs to a department (a fabric domain). Each room on a floor is a team’s workspace. Each desk in that room holds a data item. Anyone with the right access badge can walk in and find what they need — without needing a separate building.

The OneLake Namespace

One of the most practical benefits of OneLake is its consistent addressing scheme. Every piece of data in OneLake is addressable via a URI that follows this pattern

OneLake URI Format:
https://onelake.dfs.fabric.microsoft.com/{workspace}/{item}.{itemType}/{path}

Example — Lakehouse table:
https://onelake.dfs.fabric.microsoft.com/SalesAnalytics/SalesLH.Lakehouse/Tables/FactSales

This means that any tool compatible with ADLS Gen2—Azure Storage Explorer, Azure Databricks, Apache Spark, or Power BI—can connect to OneLake data using the same APIs and SDKs they already use, simply by substituting the OneLake URI.

What Lives in OneLake

Fabric Item Type	What It Stores in OneLake
Lakehouse	Delta Parquet tables (managed) + unstructured files
Data Warehouse	Delta Parquet tables—queryable via T-SQL
Eventhouse / KQL Database	Event data in columnar format
Semantic Model (Direct Lake)	References Delta tables — no separate import
Dataflow Gen2 outputs	Delta tables in a staging lakehouse
Mirrored databases	Near-real-time replicated Delta Parquet tables

The OneLake Catalog — Your Central Discovery Hub

The OneLake Catalog is the evolved replacement for what was previously called the OneLake Data Hub. It is the single, searchable interface for all discoverable Fabric items—the storefront for your organization’s data assets.

What makes the catalog more than a simple list is how it organizes data into context. Users can scope the catalog to a specific domain and subdomain—for example, Finance > EMEA—and then filter further by item type, endorsement status, workspace, or last refresh date. The result is not a flat directory but a navigable, governed data marketplace

How to Access the Catalog
The OneLake Catalog is accessible from multiple surfaces, which is deliberate—data discovery should happen in the context where work is being done, not as a separate detour.

Microsoft Fabric portal — the primary experience with full filtering and governance

Microsoft Teams — embedded catalog so analysts never leave their collaboration tool
Microsoft Excel — discover and connect to certified datasets from within a workbook
Power BI Desktop—connect to lakehouses and warehouses without leaving the modelling experience

OneLake Shortcuts — Connect Without Copying

Shortcuts are one of the most architecturally important features in OneLake—and one of the most underutilized. A shortcut is a pointer: a metadata reference that makes data stored elsewhere appear as if it lives natively inside your OneLake lakehouse. No data moves. No copy is created. The shortcut simply says, ‘Look over there.’

Where Shortcuts Can Point

Shortcut Target	Use Case	Authentication
Another OneLake location	Cross-workspace data sharing without duplication	User identity (passthrough)
Azure Data Lake Storage Gen2	Legacy Azure data or external team storage	Account key / Service Principal / Trusted Workspace Access
Amazon S3	Multi-cloud scenarios — data already in AWS	IAM access key + secret
Amazon S3-Compatible sources	MinIO, Cloudflare R2, and other S3-API services	IAM access key + secret
Google Cloud Storage	Data in GCP environments	HMAC key
Microsoft Dataverse	Business application data from Dynamics / Power Apps	Entra ID
On-premises (via OPDG)	Files or ADLS behind corporate firewall	On-premises data gateway

Mirroring — Zero-ETL Replication into OneLake

Shortcuts answer the question, ‘How do I use data that lives outside OneLake without copying it?’ Mirroring answers a different question: ‘how do I keep a near-real-time, governed copy of an operational database inside OneLake—without building a pipeline?’

Mirroring is a no-ETL, continuous replication feature. It monitors a source database for changes and replicates those changes into OneLake as Delta Parquet tables, typically within seconds to minutes. Once mirrored, the data is a full first-class OneLake citizen — queryable via SQL, Spark, and Power BI Direct Lake.

Supported Mirroring Sources

Source	GA / Preview Status
Azure SQL Database	Generally Available
Azure SQL Managed Instance	Generally Available
Azure Cosmos DB	Generally Available
Azure PostgreSQL Flexible Server	Generally Available
SQL Server 2016 – 2022 and 2025	Generally Available
Azure Databricks Unity Catalog	Generally Available
Snowflake	Generally Available
Oracle Database	Preview
Microsoft Dataverse	Preview
Open Mirroring (custom sources)	Preview

Mirroring vs. Shortcuts—When to Use Which

Consideration	Shortcuts vs Mirroring
Data movement	Shortcuts: No movement. Mirroring: Data replicated into OneLake
Best for	Shortcuts: ADLS/S3/GCS file data. Mirroring: Relational databases
Latency	Shortcuts: Real-time (reads source directly). Mirroring: Near-real-time
Transformation	Shortcuts: None at connection. Mirroring: None (raw replication)
SQL analytics	Shortcuts: Supported for Delta tables. Mirroring: Always supported
Source stays live	Shortcuts: Yes, always. Mirroring: Yes, source is unaffected

Direct Lake—Query OneLake Natively from Power BI

Once your data is in OneLake—whether natively, via shortcut, or via mirroring—the question becomes: how do Power BI reports consume it without the overhead of a scheduled import?

Direct Lake is the answer. It is a Power BI storage mode that reads Delta Parquet files from OneLake directly into the analysis engine at query time—without a scheduled refresh, without a separate imported copy, and without the latency of a DirectQuery live connection to a SQL endpoint

The Three Storage Modes Compared

Mode	How It Works	Best For
Import	Full data copy loaded into in-memory column store on refresh	Static or slowly changing data with < 1GB per table
DirectQuery	Every visual issues a query to the source at render time	Very large data with low dashboard concurrency
Direct Lake	Delta files loaded on-demand from OneLake; cached in memory	Large, fast-changing data—the Fabric-native approach

Direct Lake combines the performance of Import (in-memory column store) with the freshness of DirectQuery (no scheduled refresh needed). When the Delta table in OneLake is updated, the semantic model detects the change and reloads only the affected column segments—a process called “transcoding”—which typically completes in seconds.

Creating a Direct Lake Semantic Model

As of March 2025, Direct Lake semantic models can be authored in Power BI Desktop — not just in the Fabric portal. Key steps:

In Power BI Desktop, select Get Data > Microsoft OneLake
Authenticate with your Entra ID credentials
Browse to your workspace and select the Lakehouse or Warehouse
Select the tables you want to include—these can span multiple OneLake sources using shortcuts
Power BI Desktop creates a Direct Lake connection—no import, no DirectQuery polling
Build relationships and DAX measures, and publish to Fabric

Example Scenarios with OneLake in detailed Explaination:

Create Your Fabric Workspace

Everything in Fabric lives inside a workspace. In a real enterprise, a workspace maps to a team, a project, or a data domain. For this walkthrough we create a workspace to represent the team that owns the raw sales data

1 Navigate to the Fabric portal
Go to https://app.fabric.microsoft.com/home and sign in with your Fabric credentials.

2 Open Workspaces
In the left navigation bar, select Workspaces (the grid icon). Select + New workspace.

3 Name the workspace
Give it a meaningful name such as M365Demo_Blogs. In the Advanced section, select the Fabric or Fabric trial licence mode. Select Apply.

4 Verify the workspace
When the workspace opens, it should show an canvas ready for your Fabric item.

Create a Lakehouse and Load the Sales Data

A Lakehouse is a Fabric item that combines the flexibility of a data lake (any file type, any structure) with the governance of a data warehouse (Delta tables, schema enforcement, SQL access). It is the most natural home for raw and processed data in Fabric.

1 Create the Lakehouse
In workspace, select + New item > Lakehouse. Name it salesLH. After a moment, the lakehouse opens with empty Tables and Files folders.

2 Download the sales dataset
Open a new browser tab and navigate to: https://raw.githubusercontent.com/rajendra1918/Datasets/refs/heads/main/sales.csv Right-click anywhere on the page and select Save as to save it as sales.csv on your local machine.

3 Upload the file
In the Lakehouse explorer, highlight the Files folder. Select the ellipsis (…) menu, then Upload > Upload files. Select your sales.csv file and confirm the upload.

4 Preview the raw file
Select the Files folder to verify sales.csv uploaded. Select the file to preview its contents. You will see the raw CSV structure.

Load the CSV into a Delta Table
A raw CSV file in the Files folder is not yet queryable via SQL, and it does not benefit from Delta Lake features like ACID transactions, schema enforcement, or time travel. Loading it into a Delta table elevates the data into a governed, performant, queryable asset.

1 Trigger Load to Tables
In the ellipsis (…) menu for sales.csv, select Load to Files> sales.

2 Set the table name
In the Load to table dialog, set the table name to sales. Confirm the load operation and wait for the table to be created.

3 Verify the table
In the Explorer pane, select the sales table to view its data preview and schema. If the table does not appear automatically, select Refresh in the Tables folder menu.

Understand What Was Created
When you loaded the CSV, Fabric converted it into Delta Parquet format and stored it in OneLake. Here is what now exists behind the scenes

Component	What It Is
Parquet files	The actual data, stored as columnar Parquet files in the Tables/sales/ folder in OneLake
_delta_log/ folder	Transaction log tracking every insert, update, and delete — enables ACID and time travel
SQL analytics endpoint	Auto-generated read-only SQL interface over the table — no setup required
OneLake URI	https://onelake.dfs.fabric.microsoft.com/Data-Engineering/sales-data.Lakehouse/Tables/sales

Discovering Your Data Asset in the OneLake Catalog

Now that data exists in OneLake, let us explore the discovery experience — the journey a data consumer takes to find this asset. In a real organisation, the sales Lakehouse was created . A business analyst on a different team now needs to find and use this data. The OneLake Catalog is where that journey begins.

Opening the OneLake Catalog
1 Return to the Fabric home page
Select the Fabric icon (top left or at the top of the page) to navigate back to https://app.fabric.microsoft.com/home

2 Open the Catalog
In the left navigation pane, select the OneLake catalog icon (it looks like a data grid or table symbol). The catalog opens showing all Fabric items you have access to.

The catalog presents every Fabric item you have permission to see — regardless of which workspace it belongs to. For each item you can see

Metadata field	What it tells you
Item name & type	The name and icon indicating whether it is a Lakehouse, Warehouse, Semantic Model, Report, etc.
Workspace	Which workspace (and therefore which team/domain) owns this item
Owner	The Entra ID user who created or is responsible for the item
Last updated	When the data was last refreshed or modified
Endorsement badge	None / Promoted / Certified — signals the trustworthiness of the item
SQL connection string	The connection string for tools like SSMS, Azure Data Studio, or Tableau
Sensitivity label	Confidential, General, Public — from Microsoft Purview

Finding Sales Lakehouse
Use the catalog search and filters to find the lakehouse:

1 Search by name
In the catalog search bar, type sales. Your sales-data lakehouse should appear in the results.

2 Filter by item type
Use the item type filter and select Lakehouse to narrow the results to only lakehouses.

3 Select your Lakehouse
Select sales-data to open the detail pane. Review the metadata — location, owner, SQL connection string, and data updated timestamp.

4 Open the Lakehouse
Select Open to navigate directly to the Lakehouse explorer view from within the catalog — no need to manually navigate to the workspace.

Create the Analytics Workspace and Lakehouse

To demonstrate cross-workspace shortcuts, we need a second workspace representing the analytics team.

1 Create Analytics workspace
Return to your workspace list and create a second workspace. Name it Analytics (or any name representing a consumer team).

2 Create a new Lakehouse
Inside the Analytics workspace, select + New item > Lakehouse. Name it analytics. This lakehouse represents the analytics team’s working environment — separate from where the raw data lives.

1 Open the shortcut dialog
In the analytics Lakehouse explorer, select the ellipsis (…) menu on the Tables folder and select New shortcut.

2 Select OneLake as the source
In the New shortcut dialog, choose OneLake as the shortcut type. This means you are pointing to data inside your own Fabric tenant — not an external cloud.

3 Navigate to the source data
In the workspace list, select workspace. Then select the saleLH, expand the Tables folder, and select the sales table.

4 Review and create
On the confirmation screen, review the shortcut details and select Create.

Shortcut Target	When to Use	Authentication
OneLake (same tenant)	Cross-workspace data sharing without duplication	User identity passthrough
ADLS Gen2	Legacy Azure storage or external team data	Service principal / Trusted Workspace Access
Amazon S3	Data already managed in AWS	IAM access key + secret
Google Cloud Storage	Multi-cloud scenarios with GCP	HMAC key
Dataverse	Business data from Dynamics 365 / Power Apps	Entra ID
On-premises via OPDG	Files behind corporate firewall	On-premises data gateway

Querying OneLake Data with the SQL Analytics Endpoint

Every Lakehouse in Fabric includes an automatically provisioned SQL analytics endpoint. This is a read-only T-SQL interface over all Delta tables in the lakehouse — including tables accessed via shortcuts. No setup is needed, no connection string to configure manually, and no separate SQL pool to provision. It is simply there the moment your first Delta table exists.

The SQL analytics endpoint makes lakehouse data accessible to SQL-proficient analysts, BI tools like Power BI and Tableau, and external tools like SQL Server Management Studio and Azure Data Studio — all without moving or transforming the data.

Switch to the SQL Analytics Endpoint
1 Open the analytics Lakehouse
Navigate to the analytics Lakehouse in your Analytics workspace.

2 Switch view mode
In the top-right drop-down (currently showing Lakehouse), select SQL analytics endpoint. The view transitions to a SQL-focused interface showing your tables and a query editor.

Query 1 — Testing with sales Table:
Let us write our first analytical query. This calculates total revenue and total quantity sold for each item, ordered by revenue descending — a common starting point for sales performance analysis.

1 Open a new query
In the toolbar, select New SQL query to open the query editor.

2 Paste and run the query
Enter the following T-SQL and select Run:

SELECT Item,

SUM(Quantity * UnitPrice) AS TotalRevenue,

SUM(Quantity) AS TotalQuantity

from [SalesAnalytics].[dbo].[sales]

GROUPBY Item

ORDERBY TotalRevenue DESC

Query 2:
A second useful lens is customer-level analysis. This query identifies the five highest-value customers — useful for account management and targeted marketing decisions.

SELECTTOP5

CustomerName,

SUM(Quantity) AS TotalQuantity,

SUM(Quantity * UnitPrice) AS TotalRevenue

from [SalesAnalytics].[dbo].[sales]

GROUPBY CustomerName

ORDERBY TotalRevenue DESC

Query 3:Time-series analysis is essential for spotting growth patterns and seasonal variation. This query aggregates revenue by month

SELECT

FORMAT(CAST(OrderDate ASDATE), ‘yyyy-MM’) AS OrderMonth,

COUNT(DISTINCT SalesOrderNumber) AS OrderCount,

SUM(Quantity * UnitPrice) AS MonthlyRevenue

FROM [SalesAnalytics].[dbo].[sales]

GROUPBYFORMAT(CAST(OrderDate ASDATE), ‘yyyy-MM’)

ORDERBY OrderMonth ASC;

Good to know:
The SQL analytics endpoint is read-only by design. It cannot be used to INSERT, UPDATE, or DELETE data. All writes must go through Spark notebooks, Dataflows, or Pipelines. This separation of concerns ensures that analytical queries never accidentally modify the source data.

Creating a Semantic Model and Exploring the Data

The SQL analytics endpoint is powerful for ad-hoc querying, but it requires SQL knowledge and a query editor. Business users — account managers, finance leads, sales directors — need something more accessible. This is where the Semantic Model comes in.

A Semantic Model is a business-friendly layer on top of your Delta tables. It defines relationships between tables, creates pre-built DAX measures, applies friendly column names, and enables self-service reporting in Power BI. Critically, when built on top of an OneLake Lakehouse, the semantic model uses Direct Lake mode — meaning it reads Delta files directly from OneLake at query time, with no scheduled import and no data duplication.

Create the Semantic Model
1 Stay in SQL analytics endpoint view
Remain on the SQL analytics endpoint view of your analytics Lakehouse.

2 Select New semantic model
In the toolbar, select New semantic model.

3 Configure the model
In the Create a semantic model dialog, verify that the sales table is selected (it should be ticked by default). Give the model a name such as Sales Analysis. Select Create.

4 Locate the model in the workspace
After creation, navigate back to your Analytics workspace. You will see a new item with the semantic model icon — Sales Analysis.

Explore the Data Visually
The Explore this data feature is a lightweight, browser-based visual exploration tool. It lets business users drag and drop fields to create visualisations without needing Power BI Desktop or any local installation.

1 Open Explore this data
In your Analytics workspace, find the SalesSM. Select the ellipsis (…) menu and choose Explore this data.

2 Create your first visual
In the Explore window, drag the Item field from the data pane onto the canvas. This creates a table visual showing all items.

3 Add a measure
Drag Quantity to the Values area. The visual now shows total quantity per item.

Suggested Explorations
After building the initial visual, try the following to demonstrate the full range of the Explore experience:

Exploration	Fields to Use
Revenue by item	Item on axis, SUM(Quantity * UnitPrice) as value — use a calculated column or measure
Sales over time	OrderDate on axis (grouped by month), Quantity as value — line chart
Top customers	CustomerName on axis, Quantity as value — sorted descending, top N filter
Revenue breakdown	Item as category — pie or donut chart

Direct Lake advantage:
Because the semantic model uses Direct Lake mode, the visualisations always reflect the current state of the Delta table in OneLake. No scheduled refresh, no stale data, no import window to wait for. When the sales table is updated with new orders tonight, tomorrow’s report already includes them.

Summary — The Full Picture

At this point in our walkthrough, we have working data, a shortcut, SQL queries, and a semantic model. But there is a governance gap: nobody outside our team knows this work exists, and nobody can signal to colleagues that this is the authoritative sales dataset. Endorsement and Discoverability close this gap.

Let us bring the full journey together in one view. When we talk about discovering and connecting to data in OneLake, we are describing a layered architecture where each capability builds on the one below it.

Layer	Capability	What It Eliminates
Storage	OneLake — single, unified Delta lake	Siloed, multi-account storage sprawl
Discovery	OneLake Catalog — searchable data marketplace	Hunting across workspaces and Teams messages
Trust	Endorsement + Discoverability	Competing ‘versions of the truth’
Connection	Shortcuts — zero-copy pointers to external data	ETL pipelines just to make data accessible
Replication	Mirroring — near-real-time DB sync into OneLake	Complex CDC pipelines from operational systems
Analytics	Direct Lake — in-memory Power BI on OneLake data	Stale import refreshes and DirectQuery slowness
Governance	Catalog Govern tab + Purview labels + lineage	Shadow IT and ungoverned data sprawl

The biggest shift OneLake brings is not technical — it is cultural. When data engineers publish once and business users discover without raising a ticket, and analysts connect without copying, the organisation stops functioning as a collection of data silos and starts behaving like a single intelligent data platform.

That is the promise of OneLake — and as of 2026, it is fully available for production use.

Thanks for Reading!

Your Data is Already There — You Just Can’t Find It Yet: OneLake Discovery Unpacked

Leave a ReplyCancel Reply

Write for Us at DataAnalytic Group

Become a Subject Matter Expert

Hire Industry-Ready Talent from us

Corporate Training Programs