Your Data is Already There — You Just Can’t Find It Yet: OneLake Discovery Unpacked

Every department I engage with encounters the same underlying issue: data is plentiful, yet accessing the right information at the right moment—with sufficient confidence—is a persistent challenge. Analysts end up recreating datasets, dashboards proliferate uncontrollably, and engineers devote more effort to maintaining duplicates than extracting meaningful insights. Ironically, much of this data resides within the same cloud environment—yet it feels as inaccessible as if it were located continents apart.

The Data Discovery Problem

Microsoft Fabric’s OneLake was designed to solve this at the platform level. But owning a unified lake is only half the story. The real capability lies in how you discover what lives in OneLake and how you connect to it — without duplicating, without re-engineering, and without losing governance.

This blog post is a practitioner’s deep dive into the full discovery-to-connection lifecycle in OneLake. We will cover:

What OneLake is and why it matters as an organisational foundation

• The OneLake Catalog—your single pane of glass for data discovery
• Endorsement and Discoverability — building a culture of trusted data
• Shortcuts — connecting to data across clouds without duplication
• Mirroring — zero-ETL replication into OneLake
• Direct Lake — querying OneLake natively from Power BI
• Governance best practices that make all of the above sustainable

OneLake—One Lake for the Entire Organisation

Before we can discuss discovery, we need a clear mental model of what OneLake actually is.

OneLake is the single, unified data lake that underpins every Microsoft Fabric workload. Every Fabric tenant gets exactly one OneLake instance. Every Fabric item—Lakehouses, Warehouses, Eventhouses, Semantic Models, Dataflows—automatically stores its data inside OneLake in Delta Parquet format. There is no separate storage account to provision, no Azure Data Lake Storage Gen2 container to wire up manually. It is simply there.

The architectural metaphor that best captures OneLake is a corporate building. The building exists once. Each floor belongs to a department (a fabric domain). Each room on a floor is a team’s workspace. Each desk in that room holds a data item. Anyone with the right access badge can walk in and find what they need — without needing a separate building.

The OneLake Namespace

One of the most practical benefits of OneLake is its consistent addressing scheme. Every piece of data in OneLake is addressable via a URI that follows this pattern

OneLake URI Format:
https://onelake.dfs.fabric.microsoft.com/{workspace}/{item}.{itemType}/{path}

Example — Lakehouse table: 
https://onelake.dfs.fabric.microsoft.com/SalesAnalytics/SalesLH.Lakehouse/Tables/FactSales

This means that any tool compatible with ADLS Gen2—Azure Storage Explorer, Azure Databricks, Apache Spark, or Power BI—can connect to OneLake data using the same APIs and SDKs they already use, simply by substituting the OneLake URI. 

 

What Lives in OneLake 

Fabric Item TypeWhat It Stores in OneLake
LakehouseDelta Parquet tables (managed) + unstructured files
Data WarehouseDelta Parquet tables—queryable via T-SQL
Eventhouse / KQL DatabaseEvent data in columnar format
Semantic Model (Direct Lake)References Delta tables — no separate import
Dataflow Gen2 outputsDelta tables in a staging lakehouse
Mirrored databasesNear-real-time replicated Delta Parquet tables

The OneLake Catalog — Your Central Discovery Hub

The OneLake Catalog is the evolved replacement for what was previously called the OneLake Data Hub. It is the single, searchable interface for all discoverable Fabric items—the storefront for your organization’s data assets.

What makes the catalog more than a simple list is how it organizes data into context. Users can scope the catalog to a specific domain and subdomain—for example, Finance > EMEA—and then filter further by item type, endorsement status, workspace, or last refresh date. The result is not a flat directory but a navigable, governed data marketplace

How to Access the Catalog
The OneLake Catalog is accessible from multiple surfaces, which is deliberate—data discovery should happen in the context where work is being done, not as a separate detour.

Microsoft Fabric portal — the primary experience with full filtering and governance

Microsoft Teams — embedded catalog so analysts never leave their collaboration tool
Microsoft Excel — discover and connect to certified datasets from within a workbook
Power BI Desktop—connect to lakehouses and warehouses without leaving the modelling experience

OneLake Shortcuts — Connect Without Copying

Shortcuts are one of the most architecturally important features in OneLake—and one of the most underutilized. A shortcut is a pointer: a metadata reference that makes data stored elsewhere appear as if it lives natively inside your OneLake lakehouse. No data moves. No copy is created. The shortcut simply says, ‘Look over there.’

Where Shortcuts Can Point

Shortcut TargetUse CaseAuthentication
Another OneLake locationCross-workspace data sharing without duplicationUser identity (passthrough)
Azure Data Lake Storage Gen2Legacy Azure data or external team storageAccount key / Service Principal / Trusted Workspace Access
Amazon S3Multi-cloud scenarios — data already in AWSIAM access key + secret
Amazon S3-Compatible sourcesMinIO, Cloudflare R2, and other S3-API servicesIAM access key + secret
Google Cloud StorageData in GCP environmentsHMAC key
Microsoft DataverseBusiness application data from Dynamics / Power AppsEntra ID
On-premises (via OPDG)Files or ADLS behind corporate firewallOn-premises data gateway

Mirroring — Zero-ETL Replication into OneLake

Shortcuts answer the question, ‘How do I use data that lives outside OneLake without copying it?’ Mirroring answers a different question: ‘how do I keep a near-real-time, governed copy of an operational database inside OneLake—without building a pipeline?’

Mirroring is a no-ETL, continuous replication feature. It monitors a source database for changes and replicates those changes into OneLake as Delta Parquet tables, typically within seconds to minutes. Once mirrored, the data is a full first-class OneLake citizen — queryable via SQL, Spark, and Power BI Direct Lake.

Supported Mirroring Sources

SourceGA / Preview Status
Azure SQL DatabaseGenerally Available
Azure SQL Managed InstanceGenerally Available
Azure Cosmos DBGenerally Available
Azure PostgreSQL Flexible ServerGenerally Available
SQL Server 2016 – 2022 and 2025Generally Available
Azure Databricks Unity CatalogGenerally Available
SnowflakeGenerally Available
Oracle DatabasePreview
Microsoft DataversePreview
Open Mirroring (custom sources)Preview

Mirroring vs. Shortcuts—When to Use Which

ConsiderationShortcuts vs Mirroring
Data movementShortcuts: No movement. Mirroring: Data replicated into OneLake
Best forShortcuts: ADLS/S3/GCS file data. Mirroring: Relational databases
LatencyShortcuts: Real-time (reads source directly). Mirroring: Near-real-time
TransformationShortcuts: None at connection. Mirroring: None (raw replication)
SQL analyticsShortcuts: Supported for Delta tables. Mirroring: Always supported
Source stays liveShortcuts: Yes, always. Mirroring: Yes, source is unaffected

Direct Lake—Query OneLake Natively from Power BI

Once your data is in OneLake—whether natively, via shortcut, or via mirroring—the question becomes: how do Power BI reports consume it without the overhead of a scheduled import?

Direct Lake is the answer. It is a Power BI storage mode that reads Delta Parquet files from OneLake directly into the analysis engine at query time—without a scheduled refresh, without a separate imported copy, and without the latency of a DirectQuery live connection to a SQL endpoint

The Three Storage Modes Compared

ModeHow It WorksBest For
ImportFull data copy loaded into in-memory column store on refreshStatic or slowly changing data with < 1GB per table
DirectQueryEvery visual issues a query to the source at render timeVery large data with low dashboard concurrency
Direct LakeDelta files loaded on-demand from OneLake; cached in memoryLarge, fast-changing data—the Fabric-native approach

Direct Lake combines the performance of Import (in-memory column store) with the freshness of DirectQuery (no scheduled refresh needed). When the Delta table in OneLake is updated, the semantic model detects the change and reloads only the affected column segments—a process called “transcoding”—which typically completes in seconds.

Creating a Direct Lake Semantic Model

As of March 2025, Direct Lake semantic models can be authored in Power BI Desktop — not just in the Fabric portal. Key steps:

  1. In Power BI Desktop, select Get Data > Microsoft OneLake
  2. Authenticate with your Entra ID credentials
  3. Browse to your workspace and select the Lakehouse or Warehouse
  4. Select the tables you want to include—these can span multiple OneLake sources using shortcuts
  5. Power BI Desktop creates a Direct Lake connection—no import, no DirectQuery polling
  6. Build relationships and DAX measures, and publish to Fabric

Example Scenarios with OneLake in detailed Explaination:

Create Your Fabric Workspace

Everything in Fabric lives inside a workspace. In a real enterprise, a workspace maps to a team, a project, or a data domain. For this walkthrough we create a workspace to represent the team that owns the raw sales data

1 Navigate to the Fabric portal
Go to https://app.fabric.microsoft.com/home and sign in with your Fabric credentials.

2 Open Workspaces
In the left navigation bar, select Workspaces (the grid icon). Select + New workspace.

3 Name the workspace
Give it a meaningful name such as M365Demo_Blogs. In the Advanced section, select the Fabric or Fabric trial licence mode. Select Apply.

4 Verify the workspace
When the workspace opens, it should show an canvas ready for your Fabric item.

Create a Lakehouse and Load the Sales Data

A Lakehouse is a Fabric item that combines the flexibility of a data lake (any file type, any structure) with the governance of a data warehouse (Delta tables, schema enforcement, SQL access). It is the most natural home for raw and processed data in Fabric.

1 Create the Lakehouse
In workspace, select + New item > Lakehouse. Name it salesLH. After a moment, the lakehouse opens with empty Tables and Files folders.

2 Download the sales dataset
Open a new browser tab and navigate to: https://raw.githubusercontent.com/rajendra1918/Datasets/refs/heads/main/sales.csv Right-click anywhere on the page and select Save as to save it as sales.csv on your local machine.

3 Upload the file
In the Lakehouse explorer, highlight the Files folder. Select the ellipsis (…) menu, then Upload > Upload files. Select your sales.csv file and confirm the upload.

4 Preview the raw file
Select the Files folder to verify sales.csv uploaded. Select the file to preview its contents. You will see the raw CSV structure.

Load the CSV into a Delta Table
A raw CSV file in the Files folder is not yet queryable via SQL, and it does not benefit from Delta Lake features like ACID transactions, schema enforcement, or time travel. Loading it into a Delta table elevates the data into a governed, performant, queryable asset.

1 Trigger Load to Tables
In the ellipsis (…) menu for sales.csv, select Load to Files> sales.

2 Set the table name
In the Load to table dialog, set the table name to sales. Confirm the load operation and wait for the table to be created.

3 Verify the table
In the Explorer pane, select the sales table to view its data preview and schema. If the table does not appear automatically, select Refresh in the Tables folder menu.

Understand What Was Created
When you loaded the CSV, Fabric converted it into Delta Parquet format and stored it in OneLake. Here is what now exists behind the scenes

ComponentWhat It Is
Parquet filesThe actual data, stored as columnar Parquet files in the Tables/sales/ folder in OneLake
_delta_log/ folderTransaction log tracking every insert, update, and delete — enables ACID and time travel
SQL analytics endpointAuto-generated read-only SQL interface over the table — no setup required
OneLake URIhttps://onelake.dfs.fabric.microsoft.com/Data-Engineering/sales-data.Lakehouse/Tables/sales

Discovering Your Data Asset in the OneLake Catalog

Now that data exists in OneLake, let us explore the discovery experience — the journey a data consumer takes to find this asset. In a real organisation, the sales Lakehouse was created . A business analyst on a different team now needs to find and use this data. The OneLake Catalog is where that journey begins.

Opening the OneLake Catalog
1 Return to the Fabric home page
Select the Fabric icon (top left or at the top of the page) to navigate back to https://app.fabric.microsoft.com/home

2 Open the Catalog
In the left navigation pane, select the OneLake catalog icon (it looks like a data grid or table symbol). The catalog opens showing all Fabric items you have access to.

The catalog presents every Fabric item you have permission to see — regardless of which workspace it belongs to. For each item you can see

Metadata fieldWhat it tells you
Item name & typeThe name and icon indicating whether it is a Lakehouse, Warehouse, Semantic Model, Report, etc.
WorkspaceWhich workspace (and therefore which team/domain) owns this item
OwnerThe Entra ID user who created or is responsible for the item
Last updatedWhen the data was last refreshed or modified
Endorsement badgeNone / Promoted / Certified — signals the trustworthiness of the item
SQL connection stringThe connection string for tools like SSMS, Azure Data Studio, or Tableau
Sensitivity labelConfidential, General, Public — from Microsoft Purview

Finding Sales Lakehouse
Use the catalog search and filters to find the lakehouse:

1 Search by name
In the catalog search bar, type sales. Your sales-data lakehouse should appear in the results.

2 Filter by item type
Use the item type filter and select Lakehouse to narrow the results to only lakehouses.

3 Select your Lakehouse
Select sales-data to open the detail pane. Review the metadata — location, owner, SQL connection string, and data updated timestamp.

4 Open the Lakehouse
Select Open to navigate directly to the Lakehouse explorer view from within the catalog — no need to manually navigate to the workspace.

Create the Analytics Workspace and Lakehouse

To demonstrate cross-workspace shortcuts, we need a second workspace representing the analytics team.

1 Create Analytics workspace
Return to your workspace list and create a second workspace. Name it Analytics (or any name representing a consumer team).

2 Create a new Lakehouse
Inside the Analytics workspace, select + New item > Lakehouse. Name it analytics. This lakehouse represents the analytics team’s working environment — separate from where the raw data lives.

1 Open the shortcut dialog
In the analytics Lakehouse explorer, select the ellipsis (…) menu on the Tables folder and select New shortcut.

2 Select OneLake as the source
In the New shortcut dialog, choose OneLake as the shortcut type. This means you are pointing to data inside your own Fabric tenant — not an external cloud.

3 Navigate to the source data
In the workspace list, select workspace. Then select the saleLH, expand the Tables folder, and select the sales table.

4 Review and create
On the confirmation screen, review the shortcut details and select Create.

Shortcut TargetWhen to UseAuthentication
OneLake (same tenant)Cross-workspace data sharing without duplicationUser identity passthrough
ADLS Gen2Legacy Azure storage or external team dataService principal / Trusted Workspace Access
Amazon S3Data already managed in AWSIAM access key + secret
Google Cloud StorageMulti-cloud scenarios with GCPHMAC key
DataverseBusiness data from Dynamics 365 / Power AppsEntra ID
On-premises via OPDGFiles behind corporate firewallOn-premises data gateway

Querying OneLake Data with the SQL Analytics Endpoint

Every Lakehouse in Fabric includes an automatically provisioned SQL analytics endpoint. This is a read-only T-SQL interface over all Delta tables in the lakehouse — including tables accessed via shortcuts. No setup is needed, no connection string to configure manually, and no separate SQL pool to provision. It is simply there the moment your first Delta table exists.

The SQL analytics endpoint makes lakehouse data accessible to SQL-proficient analysts, BI tools like Power BI and Tableau, and external tools like SQL Server Management Studio and Azure Data Studio — all without moving or transforming the data.

Switch to the SQL Analytics Endpoint
1 Open the analytics Lakehouse
Navigate to the analytics Lakehouse in your Analytics workspace.

2 Switch view mode
In the top-right drop-down (currently showing Lakehouse), select SQL analytics endpoint. The view transitions to a SQL-focused interface showing your tables and a query editor.

Query 1 — Testing with sales Table:
Let us write our first analytical query. This calculates total revenue and total quantity sold for each item, ordered by revenue descending — a common starting point for sales performance analysis.

1 Open a new query
In the toolbar, select New SQL query to open the query editor.

2 Paste and run the query
Enter the following T-SQL and select Run:

SELECT   Item,

    SUM(Quantity * UnitPrice)  AS TotalRevenue,

    SUM(Quantity)              AS TotalQuantity

    from [SalesAnalytics].[dbo].[sales]

GROUPBY Item

ORDERBY TotalRevenue DESC

Query 2:
A second useful lens is customer-level analysis. This query identifies the five highest-value customers — useful for account management and targeted marketing decisions.

SELECTTOP5

    CustomerName,

    SUM(Quantity)              AS TotalQuantity,

    SUM(Quantity * UnitPrice)  AS TotalRevenue

  from [SalesAnalytics].[dbo].[sales]

GROUPBY CustomerName

ORDERBY TotalRevenue DESC

Query 3:Time-series analysis is essential for spotting growth patterns and seasonal variation. This query aggregates revenue by month

SELECT

    FORMAT(CAST(OrderDate ASDATE), ‘yyyy-MM’)  AS OrderMonth,

    COUNT(DISTINCT SalesOrderNumber)            AS OrderCount,

    SUM(Quantity * UnitPrice)                   AS MonthlyRevenue

FROM [SalesAnalytics].[dbo].[sales]

GROUPBYFORMAT(CAST(OrderDate ASDATE), ‘yyyy-MM’)

ORDERBY OrderMonth ASC;

Good to know:
The SQL analytics endpoint is read-only by design. It cannot be used to INSERT, UPDATE, or DELETE data. All writes must go through Spark notebooks, Dataflows, or Pipelines. This separation of concerns ensures that analytical queries never accidentally modify the source data.

Creating a Semantic Model and Exploring the Data

The SQL analytics endpoint is powerful for ad-hoc querying, but it requires SQL knowledge and a query editor. Business users — account managers, finance leads, sales directors — need something more accessible. This is where the Semantic Model comes in.

A Semantic Model is a business-friendly layer on top of your Delta tables. It defines relationships between tables, creates pre-built DAX measures, applies friendly column names, and enables self-service reporting in Power BI. Critically, when built on top of an OneLake Lakehouse, the semantic model uses Direct Lake mode — meaning it reads Delta files directly from OneLake at query time, with no scheduled import and no data duplication.

Create the Semantic Model
1 Stay in SQL analytics endpoint view
Remain on the SQL analytics endpoint view of your analytics Lakehouse.

2 Select New semantic model
In the toolbar, select New semantic model.

3 Configure the model
In the Create a semantic model dialog, verify that the sales table is selected (it should be ticked by default). Give the model a name such as Sales Analysis. Select Create.

4 Locate the model in the workspace
After creation, navigate back to your Analytics workspace. You will see a new item with the semantic model icon — Sales Analysis.

Explore the Data Visually
The Explore this data feature is a lightweight, browser-based visual exploration tool. It lets business users drag and drop fields to create visualisations without needing Power BI Desktop or any local installation.

1 Open Explore this data
In your Analytics workspace, find the SalesSM. Select the ellipsis (…) menu and choose Explore this data.

2 Create your first visual
In the Explore window, drag the Item field from the data pane onto the canvas. This creates a table visual showing all items.

3 Add a measure
Drag Quantity to the Values area. The visual now shows total quantity per item.

Suggested Explorations
After building the initial visual, try the following to demonstrate the full range of the Explore experience:

ExplorationFields to Use
Revenue by itemItem on axis, SUM(Quantity * UnitPrice) as value — use a calculated column or measure
Sales over timeOrderDate on axis (grouped by month), Quantity as value — line chart
Top customersCustomerName on axis, Quantity as value — sorted descending, top N filter
Revenue breakdownItem as category — pie or donut chart

Direct Lake advantage:
Because the semantic model uses Direct Lake mode, the visualisations always reflect the current state of the Delta table in OneLake. No scheduled refresh, no stale data, no import window to wait for. When the sales table is updated with new orders tonight, tomorrow’s report already includes them.

Summary — The Full Picture

At this point in our walkthrough, we have working data, a shortcut, SQL queries, and a semantic model. But there is a governance gap: nobody outside our team knows this work exists, and nobody can signal to colleagues that this is the authoritative sales dataset. Endorsement and Discoverability close this gap.

Let us bring the full journey together in one view. When we talk about discovering and connecting to data in OneLake, we are describing a layered architecture where each capability builds on the one below it.

LayerCapabilityWhat It Eliminates
StorageOneLake — single, unified Delta lakeSiloed, multi-account storage sprawl
DiscoveryOneLake Catalog — searchable data marketplaceHunting across workspaces and Teams messages
TrustEndorsement + DiscoverabilityCompeting ‘versions of the truth’
ConnectionShortcuts — zero-copy pointers to external dataETL pipelines just to make data accessible
ReplicationMirroring — near-real-time DB sync into OneLakeComplex CDC pipelines from operational systems
AnalyticsDirect Lake — in-memory Power BI on OneLake dataStale import refreshes and DirectQuery slowness
GovernanceCatalog Govern tab + Purview labels + lineageShadow IT and ungoverned data sprawl

The biggest shift OneLake brings is not technical — it is cultural. When data engineers publish once and business users discover without raising a ticket, and analysts connect without copying, the organisation stops functioning as a collection of data silos and starts behaving like a single intelligent data platform.

That is the promise of OneLake — and as of 2026, it is fully available for production use.

Thanks for Reading!

Leave a Reply

Your email address will not be published. Required fields are marked *