I got a few questions lately on how to integrate the refresh of a Power BI dataset in the orchestration that Azure Data Factory offers. For instance to process your Power BI model directly after you have loaded new data in the data lake or the datawarehouse database. Effectively ‘daisy-chaining’ these activities. This absolutely makes sense, but can be quite tricky to set-up.
Part of the trickiness is due to the fact that I not only want a clean and easy solution, but I’ve also got to be able to put this solution in production. So, I’ve got the following list of requirements:
- Integrate in Azure Data Factory, for reasons stated above
- Easy deployment using Azure DevOps release pipeline
- No personal account for authentication
- Store sensitive data (secrets/passwords) in Azure Key Vault only
- No sensitive data in logs, while executing the solution
The ‘tldr’ of this solution is to leverage the support for Service Principals in Power BI and use the Power BI REST API in a Web Activity in Azure Data Factory. Sounds easy, doesn’t it? Well, it can be a bit of challenge to set-up, but with this guide it should be easy for you from now on ✔!
I’ll try to guide you from beginning to end with this setup, and describe all the necessary steps.
If you’re an expert, you can probably use the summary below and download the Azure Data Factory pipeline I prepared for you.
Overview of steps (details below):
- Create an AAD App Registration. Add a secret.
- Create an AAD Security Group and add the SP object
- In the Power BI tenant settings, register the Security Group for API access using Service Principals
- Create a Power BI App Workspace (only V2 is supported)
- Add the Service Principal as an admin to the Workspace
- Create an Azure Key Vault and add the secrets involved
- Create an Azure Data Factory
- Make sure Data Factory can authenticate to the Key Vault
- Create an Azure Data Factory pipeline (use my example)
- Run the pipeline and high-five the nearest person in the room 🙏
Make sure you can do or are the follow items, or sit next to the admins, or be prepared to file a ticket and be patient 😊.
- Power BI Tenant Administrator (or Azure Global Administrator)
- Permissions to create an AAD Security Group (Azure AAD Administrator)
- Permission to create a Power BI App Workspace (V2) (by default a Power BI Pro user can do this, but it might be disabled in your tenant)
- Permissions to create an Azure Key Vault, an Azure Data Factory, and to configure the Identity Access Management on the Key Vault (the OWNER role on an Azure Resource Group provides all of this)
Step #1 Create AAD App Registration
Navigate to https://portal.azure.com and open the Azure Active Directory blade. Then open the App Registrations section. Then click on ‘New registration’.
Provide a name for the application, try to make it descriptive enough to later remember what it is used for.
Copy the Application (client) ID value. We need this value at a later step, so temporarily save it in your favorite note editor.
Call it “ClientId”. Also copy the ‘Directory (tenant) ID’, call it “TenantId”.
Add a secret to the application, via the ‘Certificates & secrets’ section.
Important! Make sure you copy the secret value right after you click on ‘Add’, as you won’t be able to retrieve it after you leave this page.
Save this value, call it “ClientSecret”, you need this later.
Step #2 Create AAD Security Group
Go back to the Azure Active Directory blade and open the Groups section. Click on ‘New group’.
The group type is ‘Security’, give the group a decent name and description, perhaps a set of owners and most importantly: add your Service Principal as a member, by searching using the name or ClientId of the application.
Write down the name of the group in your notes, you need it later.
Step #3 Register Power BI API access for Service Principals
Go to the Power BI Service at https://app.powerbi.com and open the Admin Portal.
Go to the ‘Tenant Settings’ section, and scroll to the setting for ‘Allow service principals to use Power BI APIs’. Enable the setting and search for the security group you have made earlier. Click Apply when you have selected your group. This setting might take up to 15 minutes to take effect ☕.
Step #4 Create Power BI App Workspace
If you already have a Power BI App Workspace in the ‘new upgraded experience’ a.k.a. ‘V2’ style, you can use that. Otherwise, please create a new App Workspace. Note that this solution will not work with ‘classic’ App Workspaces, because they do not support Service Principal membership.
Step #5 Add Service Principal to the Workspace
Got you V2 App Workspace and open the Access settings. In the search box, type the name of your App Registration / Service Principal, it’s the item with “AppID: ..” behind it’s name. Select it. Give it the Admin role, and click ‘Add’.
The result should be like this:
Also, this is the moment to double-check the dataset you want to refresh from Azure Data Factory. Credentials set? Can you successfully refresh the dataset manually?
Step #6 Create Azure Key Vault and add our secrets
If you already have an Azure Key Vault ready to use for this solution (and you have the OWNER role), you can of course skip creating a new one.
To create a new one, open the Azure Portal at https://portal.azure.com, and go the Resource Group where you want to create the Key Vault. Click ‘Add’ and choose the Azure Key Vault as type of resource. Give it a proper name and make sure you add it to the right Subscription, Resource Group, and Location. Click Create.
Navigate to the new resource. Copy the ‘DNS Name’ and save it, we need it later.
Let’s add our secrets. Open the Secrets section and click ‘Generate/Import’ to add the “ClientId”, “ClientSecret”, and “TenantId” you have written down in previous steps.
We need the name and current version of the secrets. Click on each secret to retrieve the guid of the current version.
Step #7 Create an Azure Data Factory
If you already have an Azure Data Factory ready to use for this solution, you can of course skip creating a new one.
To create a new one, go to the Resource Group where you want to create it. Click ‘Add’ and choose Data Factory as type of resource. Give it a proper name and make sure you add it to the right Subscription, Resource Group, and Location. Of course we take Version V2. Click Create.
Step #8 Make sure Data Factory can authenticate to the Key Vault
Open the properties section of the Azure Data Factory and copy the ‘Manged Identity Application ID’ value.
Go to your Azure Key Vault, and open the ‘Access policies’ section. Click ‘Add new’ to add a new access policy.
Click on ‘Select principal’, paste the Managed Identity Application ID of the Data Factory, and select it.
In the ‘Secret permissions’, select the ‘Get’ permission. Click OK. Then, click Save.
Step #9 Create the Azure Data Factory pipeline
Now, this is where the real magic is happening ✨. We’ll take a little bit more time for this step, to explain what is happening.
I’ve created a pipeline for you with all the activities needed. It looks like this:
The pipeline has 3 ‘stages’:
- Grab the secrets from the Azure Key Vault.
- Call the AAD authentication service and get the AAD token that we need to call the Power BI REST API
- Use the Power BI REST API to trigger the actual dataset refresh
Stage 1: grab the secrets from the Azure Key Vault.
Important to note here, is that we use ‘MSI’ authentication. We will use the Managed Identity of the Data Factory to authenticate to the Key Vault (or actually its API). Remember we granted it access in an earlier step? It’s also imported to use “https://vault.azure.net” as the Resource value.
Stage 2: call the AAD authentication service and get the AAD token
Important to note here, is that we call the https://login.microsoftonline.com/<tenantid>/oath2/token URL with a POST method, and in the body of this call we send the following values:
- grant_type = client_credentials
- resource = https://analysis.windows.net/powerbi/api
- client_id = <the output of the previous Web Activity, to grab this from the Key Vault>
- client_secret = <the output of the previous Web Activity, to grab this from the Key Vault>
Stage 3: use the Power BI REST API to trigger the actual dataset refresh
Important to note here, is that we call the Power BI REST API (https://api.powerbi.com/v1.0) with a POST method, and as the Authorization header we send the AAD bearer token we retrieved in the previous Web Activity.
The Web Activity still requires us to have a value for the Body of the call.
Key in the setup of this entire pipeline is that the Web Activities do not log any sensitive data when they are executed:
Download the json file here.
Add a new pipeline and open the Code view.
Replace the code with the content of the json file, and click Finish. Rename the pipeline (example: “pl_PBI_dataset_refresh”).
Publish the pipeline.
#10 Run it!
The idea behind this pipeline is to execute it from another pipeline, for instance the ‘master’ pipeline of the Datawarehouse. But, you can also trigger the pipeline standalone.
Either way, you will need to provide the necessary values.
Probably you already have all of these parameter values at hand. Click Finish after you have filled in the form, and navigate to the Monitor view of the Data Factory to inspect your results: Status=Succeeded 😎.
It took in this case 8 seconds to call the API and wait for the dataset refresh to be completed (it might have failed though, but more on that later). Yes, for some reason the Web Activity executes a synchronous call (normally it’s in asynchronous mode). I like it.
And the evidence in the Power BI dataset refresh history:
High-five the nearest person in the room 🙏, you have now ‘daisy-chained’ an Azure Data Factory pipeline execution and a Power BI dataset refresh!
So we now have our solution ready and tested successfully. Let’s take a look at the initial requirements and see how we did.
- ✔ Integrate in Azure Data Factory, and limit it to that (no Logic Apps, Functions, Workbooks)
- ✔ Easy deployment using Azure DevOps release pipeline –> it’s super-easy to deploy Azure Data Factory pipelines using its git integration.
- ✔ No personal account for authentication –> we’re leveraging a Service Principal! Make sure to use a separate Service Principal per environment, and only use it for a single cause.
- ✔ Store sensitive data (secrets/passwords) in Azure Key Vault only (no secrets in code repo) –> our code is clean.
- ✔ No sensitive data in logs, while executing the solution –> input & output secured.
Ideas for next steps
I think it would be great to add a Web Activity at the end, retrieving the dataset history to see if the refresh actually succeeded, and to notify the operators of this process.
I’d like to daisy-chain a Power BI dataflow refresh with a Power BI dataset refresh 😎.