In this blog post, I will walk through the steps required to build a Dataflow in CDS to sync data between an external datasource and CDS. Dataflows make the job easier when it comes to synchronizing data from different data sources (Excel, SharePoint, SQL, Access etc.) and data can be updated either on a schedule or manually.
Problem Statement
There could be cases where data is stored on multiple data sources based on applications. This could be SharePoint, Excel, SQL Server, On-Premise etc. and we want to sync that data into CDS and centralize everything. The question is how to configure an automated process to update all the data in appropriate data types? How to perform calculation, apply conditions and make data more meaningful before copying?
Solution
Dataflows come to our rescue. By following simple yet important steps, we can create a process to copy all the data from a different data source to CDS. We can also transform the data to save it in required format. Dataflows can be set to refresh on a schedule or can be manually triggered.
Let's get started with building a Dataflow.
Let's take an example of Student Database that is stored in SharePoint and now there are certain new applications to track the student progress built on top of CDS. Both applications are equally important but the data needs to be analysed at a one stop point. We will create a dataflow to get the data from the SharePoint List and copy it over to CDS.
Step 1: Understand the Source Data.
In our scenario, the SharePoint list stores all the information of marks scored by a student across subjects through an academic year.
List Name: Student Academics
List Attributes:
Student Name -> SLT
Year -> Number
Class -> SLT
Subject -> Choice
Marks Obtained -> Number
Step 2: Understand the Destination Data.
The over all student metrics and relevant data is all stored in CDS and the academic progress needs to be tracked here too. Based on the source, we need to have appropriate fields in place in the CDS to copy the required data. Now, to appropriately map this data, let's introduce some look-ups and options set to see how we can update fields with different data types.
Entity 1: Students -> This stores details about the students.
Name -> Text
Address -> Text Area
Checkpoint 1: We will have to make a key that would be unique to map the data while creating dataflow. Make a note of this as we would require the key in later steps.
Steps to add key:
Open the Students Entity and then navigate to Keys tab. select Add Key and fill in the details as per the screenshot below.
Note: To make the lookup field work in Students Marks entity, this entity should be in place with all the students data. Otherwise, the data flow will fail because it will not find the appropriate record to map the data.
Entity 2: Students Marks -> This will store data for individual students.
Student -> Lookup -> Students
Year -> Number
Class -> Single line of Text
Subject -> Option Set
Marks Obtained -> Number
Checkpoint 2: Note the actual values for all the option set options. This is because the dataflow requires the actual value of the options instead of label for updating the options set.
Steps to get Actual Value of options:
Open the option set and select three dots beside option you want to get value of. Select View More and note down the value.
Step 3: Start building the Dataflow
Select Data from the left menu and then select dataflows. Select New dataflow to start building one:
Give your dataflow a name and then click Next.
Now, select the datasource, (which is SharePoint in this case).
Create a connection to the datasource. (In case of SharePoint, you need to enter the Site URL. You can use already created connections as well.)
Select Next.
Now select the table/list you want to add as part of this dataflow.
Select Transform Data.
Note: If you have created dataflow outside the solution, you can add it as an existing component and it will be ready to export.
Step 4: Transform Data. Add Conditions for attributes if applicable
If you do not wish to customize any column data, then you can directly select Next to map the fields and move to Step 5.
In this case, we want to update an option set type column and that's why we need to create an additional column.
There are a number of transformations you can perform on the data to make it appropriate before the field mapping.
Available Options:
1) Perform table level operations like duplicate detection, make first row as header etc.
2) Transform data on existing columns like concatenation, split, transpose etc.
3) Add a new column
In this case, we will create a conditional column to update option set.
Select Add column and then select Conditional column. Get the values from Checkpoint 2 and update them as per below image.
Note: The Subject column in SP list was a choice column so we will need to use SubjectValue to get the text.
Click OK and verify the newly added column and then select Next.
Checkpoint 3: The conditional column for option set.
Step 5: Field Mapping
Once all the columns are in place, they can be mapped to destination attributes as shown in the below image.
Note: You can also choose to create a new entity or add to an existing one.
1) Class is an SLT field on both SP and CDS entity.
2) Marks Obtained and Year are both Numbers on SP and CDS Entity.
3) Student is a text on SP list while a LookUp field on CDS entity and this is where we are able to select the values because of key we created in Checkpoint 1.
4) Subject is a choice field on SP and an Option-Set on CDS entity. This is configured through custom column created at CheckPoint 3: .
Step 6: Refresh Settings
There are two options to select from:
1) Refresh Manually -> This would require manual attention and will only be triggered when the dataflow is triggered manually.
2) Refresh on Schedule -> This would require us to provide a startdate and intervals on which the dataflow should be triggered.
Step 7: Error Handling
In case of any errors, you can select the dataflow and view run history. If the dataflow refresh failed, it will provide a detailed summary of the points at which the refresh failed.
You can open the refresh history by selecting the dataflow as shown in the below image and it will show you a history of executions till date.
Now, you can select and download error logs for individual runs. The error logs can be monitored to understand the failed executions and the reasons as shown in the image below.
As per the above image, the error highlights that it is unable to find the lookup records that it needs to map to the Students entity.
Once the dataflow run is successful, the data from the SP list is copied to CDS:
Export Dataflow
A dataflow can be included as a part of a solution and then exported.
In this post, we saw how you can build automated dataflows to copy and sync data from a SharePoint List to CDS. Similarly, dataflows can be created for other data sources like SQL, MS Access, MS Excel etc.
I hope this was useful for you. In case of any questions or suggestions, feel free to reach me out on twitter at @agarwal_ritika
Comments