In our current workplace, it is common to have multiple sources of information of several types. How you handle them is key to making the right decisions. It is also common to have some of the data in Excel spreadsheets. We all use them, and sometimes, it is a source of key information that is not contained elsewhere.
Why is using Azure Data Factory essential for your company?
Being able to gather that data and mix it with some other structured data could bring some key insights about your business. One way of doing this is by using Azure Data Factory. It consists of a platform where the user can create data pipelines.
In this blog we are going explain how to use Azure Data Factory to gather data from an Excel spreadsheet and store it in a SQL server table or tables, so we can use any ETL(Extract-Transform-Load) tool or reporting software to mix it with your other relational and non-relational data.
How do I import an Excel file into a SQL database?
The first step we need to do is to store the Excel spreadsheet where the Azure Data Factory (ADF) is able to access it. There are a lot of connectors built for Azure Data Factory already. In this example we will use one of the less expensive services in Azure called Blob Storage. There is a process to save and replace the file in the Blob Storage every time new data is added. Once the file is uploaded there, we can start the process in Azure Data Factory.
Step 1: First we need to create the linked service to the source (Blob Storage)
Step 2: Then we have to create the linked service to the target, it will be an Azure SQL database in this example, but it could be any relational database, or a csv file stored in another storage location.
Step 3: When we create the dataset for the source, we have to select Excel as the format
Step 4: In the connection settings of the dataset, we have 2 options to manage the sheets inside the spreadsheet, this option is called Worksheet Mode. Which option to choose will depend on how the file sheet names could vary over time.
The first option called Name will match the name defined with the name in the sheet in the file. If the sheet name changes for any reason the pipeline won’t grab the data on that sheet.
The second option called Index will start from 0 (It is always an incremental number starting in 0) from left to right assigning an index to each sheet. It doesn't matter if the name changes it will always select the same sheet by the index.
Step 5: If the file has several sheets, you must repeat this process and create a dataset per sheet that you want to access the data from.
Another important setting to define is the Cell range, this will define which subset of cells to grab from the sheet. If the sheet has data in more than one range, it is needed to define another dataset.
Step 6: One final step to creating the dataset is to import the schema. In this example we will use the first row as header.
Step 7: Following the creation of the source dataset we have to create the target dataset. In this example we will create a single table directly in the SQL database. The table created will have the same columns as the excel file in this example, but you could map the result to any table or columns you define.
When the dataset is created, import the schema which should look like this:
Step 9: Now that we have both datasets, we will proceed to create a new Pipeline. In the pipeline we will add a Copy activity like the image below:
Step 10: In the Source tab we will select the Excel dataset created before. In the Sink tab we have to choose the SQL database we created. In the mapping tab is where we will define which columns from the Excel spreadsheet will map to which columns on the SQL table.
Step 11: We just have to save the pipeline and execute it manually or define a periodic trigger. The result should look like the below:
If any transformations need to be done to the data in the Excel spreadsheet, we can also use other activities or transform the data using a Dataflow.
As a company, you will have data scattered across many different systems. By using Azure Data Factory to load Excel spreadsheets into structured databases and analyzing the information for insights, your team can be more productive by finding all the relevant information in one place.
Being able to acquire, cleanse, transform and store all kinds of data, and going even further, by defining a model that better suits your company’s needs, is definitely key for a business data journey. At ProServeIT we have helped companies in many sectors with the data process to transform their results. For further information on how we can help your organization to turn your data into a growth strategy, visit our Insights & Automation page and our Power BI – Cloud solutions page.
To get started, book a call to discuss your organization’s unique requirements.