Unlocking Your Data with AI: Building Data Agents in Microsoft Fabric

In the world of data analytics, asking questions of your data shouldn’t require writing complex SQL queries. Microsoft Fabric’s AI Skills and Data Agents make this possible, turning natural language questions into actionable insights. This blog walks through loading the AdventureWorks dataset, creating an AI skill, and querying it programmatically—all while handling real-world challenges like ambiguous terms such as “active customers.”

Create a New workspace with Name “Fabric DataAgent”

rajendraongole1_0-1765738247529.png

Configure the workspace with Fabric Capacity:

rajendraongole1_1-1765738323168.png

Workspace with Name Fabric DataAgent Created:

rajendraongole1_2-1765738361947.png

Create a LH

rajendraongole1_3-1765738502795.png

I have given a name with DataagentAIMF for this Lakehouse.

rajendraongole1_4-1765738538004.png

Once LH created , as you already aware of SQL Analytics Endpoint would be creates as like below image:

rajendraongole1_5-1765738612147.png

From here, I will ingest data via New Notebook as per the following script.

rajendraongole1_6-1765738674837.png

Once you open the notebook below the interface, you can observe:

rajendraongole1_7-1765738721582.png

Below is the Notebook script to ingest the Adventure works csv files and load the tables into Lakehouse.

Script:

import pandas as pd

from tqdm.auto import tqdm

base = “https://synapseaisolutionsa.z13.web.core.windows.net/data/AdventureWorks

# load list of tables

df_tables = pd.read_csv(f”{base}/adventureworks.csv”, names=[“table”])

for table in (pbar := tqdm(df_tables[‘table’].values)):

    pbar.set_description(f”Uploading {table} to lakehouse”)

    # download

    df = pd.read_parquet(f”{base}/{table}.parquet“)

    # save as lakehouse table

    spark.createDataFrame(df).write.mode(‘overwrite’).saveAsTable(table)

Script placed in notebook cell as below:

rajendraongole1_8-1765738873808.png

In the query editor, copy and paste the following code. Select the Run all button to execute the query. After the query is completed, you will see the results.

rajendraongole1_9-1765738917581.png

Uploaded all the tables : check the below snapshot FYR:

rajendraongole1_10-1765738952788.png

You can see the tables that has been available now in Lakehouse Tables with DataAgentAIMF:

rajendraongole1_11-1765739035236.png

Now, let’s call the workload (New Items) from the created workspace Data Agents (Preview).

rajendraongole1_12-1765739086626.png

I have given the Data Agent name :MFDataAgent

rajendraongole1_13-1765739153976.png

The data agent interface looks the following way:

rajendraongole1_15-1765739227286.png

Let’s try to add the data source from Lakehouse that we just ingested with Adventures works tables as below:

rajendraongole1_16-1765739268688.png

Fetching all the tables:

  • DimCustomer
  • DimDate
  • DimGeography
  • DimProduct
  • DimProductCategory
  • DimPromotion
  • DimReseller
  • DimSalesTerritory
  • FactInternetSales
  • FactResellerSales
rajendraongole1_17-1765739343195.png

We can select the tables and start asking questions about the data with simple and natural sentences

rajendraongole1_18-1765739391738.png

I just highlight the tables selected and important instructions so that we can take it as handy:

rajendraongole1_19-1765739439736.png

This time i just selected for testing with one table, FactInternetSales, and write a question. 

Question like, “What is the most sold Products?”

rajendraongole1_20-1765739496003.png

Output: It starts analyzing the lakehouse tables from the database and finds the output:

rajendraongole1_21-1765739568669.png

Desired output result from the above question:

rajendraongole1_22-1765739648945.png

Even you can also see the SQL query for the generated output: no need to write it on your own , it is auto-generated. Cool, right?

rajendraongole1_23-1765739715013.png

Output: other than SQL Query, you can also see the below window with output as per the query that we would asked.

rajendraongole1_24-1765739757429.png

Next query: 

how many active customers did we have June 1st, 2013?

rajendraongole1_25-1765739824080.png

Output: Once you submit the query, output follows as below:

rajendraongole1_26-1765739859922.png

Note: Part of the problem is that “active customer” doesn’t have a formal definition. More instructions in the notes to the model text box might help, but users might frequently ask this question. You need to make sure that the AI handles the question correctly

Use the Data agent programmatically

Both instructions and examples were added to the Data agent. As testing proceeds, more examples and instructions can improve the AI skill even further. Work with your colleagues to see if you provided examples and instructions that cover the kinds of questions they want to ask.

You can use the AI skill programmatically within a Fabric notebook. To determine whether or not the AI skill has a published URL value.

rajendraongole1_27-1765739939040.png

Once you published it, you can see the below wizard with publish data agent

rajendraongole1_28-1765739987148.png

Generate the published url as below:

rajendraongole1_29-1765740045855.png

%pip install “openai==1.70.0”

you can open the notebook back and execute the above command in code cell

rajendraongole1_30-1765740119698.png

The code below is a Python template designed to programmatically query a Microsoft Fabric Data Agent (AI Skill) using the Azure OpenAI Assistants API.

It demonstrates how to leverage the familiar OpenAI SDK to send natural language questions to a secure data endpoint hosted within Microsoft Fabric.

Summary of the Code
1. Custom Authentication
It defines a custom class, FabricOpenAI, that inherits from the standard openai.OpenAI client.

The key purpose is to handle authentication securely for the Fabric environment.

It overrides the request preparation to automatically inject the Fabric environment’s AAD token (retrieved via configs.driver_aad_token) into the request header as a Bearer token for every API call. This is essential for secure communication.

import requests

import json

import pprint

import typing as t

import time

import uuid

from openai import OpenAI

from openai._exceptions import APIStatusError

from openai._models import FinalRequestOptions

from openai._types import Omit

from openai._utils import is_given

from synapse.ml.mlflow import get_mlflow_env_config

from sempy.fabric._token_provider import SynapseTokenProvider

base_url = “https://<generic published base URL value>”

question = “What datasources do you have access to?”

configs = get_mlflow_env_config()

# Create OpenAI Client

class FabricOpenAI(OpenAI):

    def __init__(

        self,

        api_version: str =”2024-05-01-preview”,

        **kwargs: t.Any,

    ) -> None:

        self.api_version = api_version

        default_query = kwargs.pop(“default_query”, {})

        default_query[“api-version”] = self.api_version

        super().__init__(

            api_key=””,

            base_url=base_url,

            default_query=default_query,

            **kwargs,

        )

    def _prepare_options(self, options: FinalRequestOptions) -> None:

        headers: dict[str, str | Omit] = (

            {**options.headers} if is_given(options.headers) else {}

        )

        options.headers = headers

        headers[“Authorization”] = f”Bearer {configs.driver_aad_token}”

        if “Accept” not in headers:

            headers[“Accept”] = “application/json”

        if “ActivityId” not in headers:

            correlation_id = str(uuid.uuid4())

            headers[“ActivityId”] = correlation_id

        return super()._prepare_options(options)

# Pretty printing helper

def pretty_print(messages):

    print(“—Conversation—“)

    for m in messages:

        print(f”{m.role}: {m.content[0].text.value}”)

    print()

fabric_client = FabricOpenAI()

# Create assistant

assistant = fabric_client.beta.assistants.create(model=”not used”)

# Create thread

thread = fabric_client.beta.threads.create()

# Create message on thread

message = fabric_client.beta.threads.messages.create(thread_id=thread.id, role=”user”, content=question)

# Create run

run = fabric_client.beta.threads.runs.create(thread_id=thread.id, assistant_id=assistant.id)

# Wait for run to complete

while run.status == “queued” or run.status == “in_progress”:

    run = fabric_client.beta.threads.runs.retrieve(

        thread_id=thread.id,

        run_id=run.id,

    )

    print(run.status)

    time.sleep(2)

# Print messages

response = fabric_client.beta.threads.messages.list(thread_id=thread.id, order=”asc”)

pretty_print(response)

# Delete thread

fabric_client.beta.threads.delete(thread_id=thread.id)

Use the + Code icon below the cell output to add a new code cell to the notebook, enter the following code in it and replace the URL. Click on ▷ Run button and review the output

rajendraongole1_31-1765740330566.png

you can see the complete status on your executed cell as below:

rajendraongole1_32-1765740373377.png

Conclusion: Conversational Analytics is Now a Reality

The journey we’ve completed – from creating a Fabric workspace and loading the AdventureWorks dataset into a Lakehouse, to deploying and programmatically querying an AI Data Agent—demonstrates a seismic shift in data analytics.

The Microsoft Fabric Data Agent empowers business users and data professionals alike to bypass the steep learning curve of SQL. It delivers on the promise of conversational analytics, making data access as simple as asking a question.

Happy Reading!!

Leave a Reply

Your email address will not be published. Required fields are marked *