PowerPivot Data Refresh

In this posting we will take a look at the data refresh facility in PowerPivot for SharePoint and show you how to use it for maximum benefit. To start off, let’s begin with a simple question: So . . . what is data refresh? And why should I care? Data refresh is all about keeping your data current. It ensures that your users are making decisions based on the most current and up-to-date information possible. This is done with as little investment of your time as possible. We want to make it easy and straightforward to have your data be as current as it can be. Set it up once and it runs until you tell it to stop.

Example: Suppose you have spent several hours, maybe days, on an important workbook. You have gather data from five sources: three corporate IT databases, a local spreadsheet provided by a colleague with market share information, and finally a local table embedded in a worksheet that you have typed in by-hand. The local table drives a market penetration forecast that is the whole point of the workbook. The business problem at hand is this forecast. And you publish the workbook to your local PowerPivot for SharePoint site --- and all of your fellow team members LOVE IT! Excellent work. Your boss walks in the door and says “Nice work. But we’ve just published this month’s data, can you update the numbers?” Aww, Oh. Now what do you do? Updating the data wasn’t in your original plan.

Sure you could copy down the whole workbook and update the data on your local machine and re-publish, but there should be a way for you to automatically request that the data be updated for you. After all, all of the data is on the server. Why not? What about the fact that some of the corporate databases are updated weekly, some are monthly – do you have to manage that process yourself also? Why couldn’t the system update the data and then it is just there when you came into work the next day?

Well, the answer is that the system can – and that is what the PowerPivot data refresh facility is all about.

Before we get too deep into it, let’s talk a bit about what the data refresh facility isn’t. The word ‘refresh’ and ‘data’ is used a lot in SharePoint; all over Excel and all over most IT systems. In our case, we are talking about a specific kind of data refresh. We are talking about the data that is stored inside the PowerPivot workbook. When you have the workbook open, it is the data stored in the PowerPivot Client Window. It is the thousands and thousands (possibly even millions) rows of data that you have imported into the workbook.

clip_image002

It is that data. . . To update that data on the server, the following two restrictions have to hold:

1. The data has to be available to the server. This seems obvious, but sometimes its not. Files located on your local hard drive, or within the workbook itself (such as the local table) cannot be refreshed on the server. The data has to be located somewhere the server can get to it.

2. You have to ask that the data be refreshed. You have to setup a refresh schedule. You have to indicate what data sources to be included. The system does not do the data refresh by itself – well, it does – but only if you request it.

This raises an interesting question. Just because you have a workbook, does it have to be refreshed? Well, the answer is NO. It is up to you and your business problem. It is quite reasonable that you might consider it more important that you (as the expert) review the workbook before the data is published. It might be that assumptions and business rules that applied in the past are not longer valid. And a person would only know that if they saw the report prior to others using it. If this is the case then by all means DO NOT SETUP DATA REFRESH! Only use the facility if it solves your business problem.

However, for many classes of problems, the automatic daily and weekly refreshing of the data is as important to its use. Data can only be acted upon if it is fresh and relevant to the business. Stale data is misleading and could harm the very business processes that are being developed. Only you the expert knows the difference.

How to setup a data refresh schedule

There are two ways of setting up a data refresh schedule. First, you can use the standard document library option menu: the item is called “Manage data refresh”. The other option is to use the Gallery Silverlight control and click on the “Manage data refresh” icon.

clip_image004

To setup the schedule, click on “Enable” and provide the following information:

  • When to run the schedule? Typically you would click on daily or weekly, but there are lots of options here. Many of the options are only displayed when the relevant time period is selected. For example if you click on “Monthly” then the following options appear.

clip_image006

  • When should the refresh schedule run during the day? Normally you should enter “During after business hours” – which the administrator establishes as properties of the PowerPivot service application. If you don’t want to run the schedule then, you can enter in the starting time for when the job will run. This has two interesting side-effects: (1) the time specified is the ‘starting’ time – meaning that given system usage and available resources, the actual job may run later, this is just the earliest that it could run; and (2) that after a successful run, the schedule is automatically disabled. To be totally honest, the second side-effect is done on purpose to discourage users from not using ‘after business hours’. We think that that the ‘after business hours’ approach makes the most effective use of resources.
  • To whom should failure email notifications be sent to? You can enter several email addresses/users if you wish. To be notified when workbooks are refreshed successfully, use the normal SharePoint alerting facility to send emails out when new content on a document library arrives. This is normal SharePoint. This option is for failures.

Now let’s take a look at the Windows credentials to run the schedule under:

clip_image008

  • Who should the user be? There is one Windows user regardless of how many schedules are specified in this job. In this field you specify “Who” the user is that will run at 2am in the morning. It must be a Windows user. You have several options here, the end result of each one is an NT account:
    • “Connect using the credentials already stored inside the workbook” – in this case, the system will use the unattended execution account that is specified for the PowerPivot for SharePoint service application. Typically this is a low permissions account that has no trusted access to the various data sources. In this case, the data refresh facility will use the non-Windows authentication (i.e. sql logins) that have been specified on the connection string in the data sources.
    • “Connect using the following Windows user credentials” – The user can enter in a Windows username and password. We will store these credentials in the SharePoint Secure Store facility for the user. At 2am the data refresh facility will pull the credentials out and do a Windows logon using them – and the data sources can then make trusted connections.
    • “Connect using the credentials saved in Secure Store Service” – if the user has the ability to enter in their own Secure Credentials the data refresh facility will use them at 2am for the logon. If using this approach, the application ID must be a Secure Store group that includes the service account used by the PowerPivot service application (so we can read the credentials).

NOTE: These options are important because they establish the Windows environment for the job. If there are any trusted connections used in the workbook’s data sources, then this is the Windows user that will be logged on for the job; the credentials in the data source are used for non-Windows authentication..

Finally let’s take a look at the data sources contained in the workbook.

clip_image010

  • Lastly you will see a list of data sources. You can select all data sources, or data source by data source within this job. For each data source you specify if you want it included in the data refresh schedule or if you would like to schedule it separately. For a data source you can specify:
    • If you want this data source to be included in the default schedule, or you can specify a custom schedule. All of the same options for the workbook can be ran on each data source, e g. the third Tuesday of the month.
    • For each data source you can specify if you want the data source included in the default schedule with the workbook or separately. If separately you can control which tables are updated and when. The system knows what tables have been imported from what data sources. When you select a specific data source, then only those tables that are built upon it are updated using this schedule. If you have two tables that you would like refreshed at different periods but they use the same data source, then you have to enter the same data source twice so you can schedule the two tables independently.
    • The username and passwords that are specified for the data source can either be (1) from the embedded workbook (i.e. you entered the credentials in the client when the workbook was created), (2) custom username and password that you enter in by-hand (and we will store for you in Secure Store), or (3) they come from a Secure Store Application ID that you have created. In any case, these credentials are for non-Windows authentication. They are not used to establish a trusted connection – that is what the other Windows credentials are for up above in the dialog box.
    • If you do not want the data source to be updated, such as if the data source is contained on your C:\ drive, then uncheck the data source and it will be ignored when the schedule runs.

Once you have established a data refresh schedule then you will find that the “Manage data refresh” option first points to the history page of the workbook. The history page shows you when the data refresh schedule(s) were run and what the outcome of the job was (i.e. success or failure).

clip_image012

Conclusion

In summary, use the data refresh facility to keep your data up-to-date. Just setup your schedule and the system runs in the background automatically. Your data just arrives when asked for. The facility is designed as a simple, straightforward way to have end users schedule their own data refresh requests. The system runs the schedule automatically (typically ‘after business hours’) and then posts the updated workbooks back into SharePoint in the same document library where the workbooks came from.

Finally, we wouldn’t be honest if we said that the data refresh facility was the right solution for everyone. Clearly it isn’t. It is oriented to you, the end user; not to your IT colleagues. As a scheduler, it lacks many of characteristics that IT systems typically have. For example, there is no real-time or on-demand access to data; or the ability to issue frequent updates, such as hourly or every 5-10 minutes. These capabilities are important in their own way, and some systems demand them, but in the managed self-service world they come with a management overhead and complexity that is just not appropriate for our users.

Comments

  • Anonymous
    November 20, 2009
    Hi Dave,Great post, thx for sharing this with us.One question, what happens when a data refresh fails? Maybe a column is changed from int to varchar or is removed entirely, maybe a authorisation failure. Will the sheet be rolled back to the previous state, not empty i hope?Greetings,Kasper
  • Anonymous
    November 23, 2009
    On a failure, the error is logged so you can see it on the above history page. The existing file(s) remain untouched; it is only updated in the case of a successful refresh. The data refresh schedule will continue to fail until an owner/contributor fixes the problem.  The recommended course of action is to have an owner/contributor download the file and modify the schema/content to be consistent with the new format -- and then repost an updated version to SharePoint. If the upload is to the same location, then data refresh will automatically start again. This assumes that the person to repost the updated version is also the person who setup the schedule. If a different person updates the file, then the schedule is disabled.
  • Anonymous
    November 30, 2009
    Suppose we have a pivot connected to multiple data sources, is it possible to choose which source to refresh...if yes then this is something nice, if not this is nothing new
  • Anonymous
    December 12, 2009
    How update data from Access file?
  • Anonymous
    January 18, 2010
    Are yuo using the 32 bit or 64 bit version of Excel??I have tested the 32 bit and its very slow with rownumbers above 100 000 and a couple of related tables.
  • Anonymous
    January 18, 2010
    Are yuo using the 32 bit or 64 bit version of Excel??I have tested the 32 bit and its very slow with rownumbers above 100 000 and a couple of related tables.
  • Anonymous
    February 10, 2010
    Do you know if 3rd party OLAP-OLEDB support is being developed for PowerPivot in SharePoint 2010?  So if I create my Excel workbook pulling in from a 3rd Party OLAP provider (BPC, Oracle or etc), will I be able to upload that workbook to SharePoint and have it refresh data?  Of course there would need to be a 3rd Party OLAP provider for the PowerPivot Service App or Analysis Service install as well.
  • Anonymous
    March 17, 2010
    I like the power pivot work which is done, Is it possible to use PowerPivots power in isolation not using Excell,i.e., i have some dataset with multiple tables, which is as good a spreadsheet with different sheets, can we use this to make slicess out of it? This is will prevent us from writing the code and use the existing work which has already been done by PowerPivot binaries to get some usefull information out.Kind RegardsVinay
  • Anonymous
    April 23, 2011
    Microsoft has done us all a great dis-servcie by not deatailing how to setup third party external resources.  This has to be the most frustrating thing to setup...how do they expect us to use this feature when there is so much administrative overhead to set it up..the error messages provide no help or value.  Just meaningless errors that provide us with no detail of what is actually wriong.  I have spent weeks trying to get ORacle setup, then once I did on my development system, I run into more problems on my production servers.  SHAME SHAME SHAME on Microsoft....
  • Anonymous
    June 15, 2011
    I have also one BIG problem with refreshing data in sharepoint. After the powerpivot workbook refresh, I cannot see data inside it. Every worksheet is empty. I see only colums and filters names. What is the reason?
  • Anonymous
    September 20, 2011
    Thanks for posting.Had a question:I wish to create a list from the data refresh history (I could then display an indicator somewhere on the site stating whether the Powerpivot data is up to date or not) but was unable to do so.Any ideas?Many thanks.Arthur
  • Anonymous
    October 27, 2011
    Great Post!One question: Is it possible to apply data refresh schedule to multiple books?
  • Anonymous
    November 02, 2011
    Great post!Question though: I am reading from a local excel file. I uploaded the report and then added in a column on the local report. I than tried to do a data refresh on powerpivot table. It said it successfully refreshed, but the new column is not there. (I double checked the new column is on the local file i.e. it was saved properly.) any ideas?
  • Anonymous
    March 19, 2012
    Hi,when I do refresh PowerPivot on the server using scheduler it still creates new PowerPivot database. Can I setup it not to create new database?Thank you.Martin
  • Anonymous
    April 29, 2012
    We are in the year 2012 with SQL2012 and it's still not possible to schedule a datarefreh more then once a day ...so strange
  • Anonymous
    January 22, 2014
    Hi Dave,Is it possible for a SharePoint Administrator to view a list of all active Data Refresh Schedules, either through the UI or Powershell and edit or disable them?I have a workbook that has been deleted and is no longer in the recycle bin that appears to still have an active data refresh schedule which is repeatedly being refreshed every 2 minutes (as seen on the powerpivot service application home page). As the workbook has been deleted I cannot view the refresh schedule from the UI.ThanksFred
  • Anonymous
    June 24, 2014
    Hi Dave,Thanks for sharing your Knowledge with us,I have a question,Current procedure followed:- I have a SQL query from which I am extracting data from Oracle database and importing that to Access 2010 and from there, I have linked this Access File to Excel Power pivot from where it Imports the data and prepare a Dashboard. Daily I have to repeat the process, that update the access file and then refresh the PowerPivot to prepare Dashboard.Note: I have to change the Dates(Future Dates) in SQL query each day.Please tell me how can I automate this process or connect the Excel power pivot to  Oracle SQL Query directly, removing MS-Access from whole process.
  • Anonymous
    July 28, 2014
    I have an Excel 2010 pivot table that has multiple connections from 2 SQL database tables. I have tried multiple times to get another users pc setup so that they are able to refresh the Pivot table data. I tried with a direct ODBC connection to the SQL Tables and also Creating an .odc file placed in their "My Datasource" folder.I tried to refresh the data connection string and it fails saying that not a valid path see administrator.I am looking for the best solution in setting up a user so that they are able to manipulate the Pivot table. Please keep in mind that the original Pivot was created on 64bit pc and the user is 32bit, would that make a difference?
  • Anonymous
    September 13, 2014
    I have created a power pivot from external Excel data file, I have added a new column in the excel data file and now when i am try to refresh the power pivot, it doesn't show the new column I have added in the data. Please help
  • Anonymous
    August 26, 2015
    I have question regarding the update of the Pivots in the Excel Client: The PowerPivots get automatically refreshed but the Excel sheet  with the Pivot Chart based on the PowerPivot is not updated. So the user sees the old data in the Excel Service Webpart and has to update the sheet. How can the Excel sheet  with the Pivot Chart automatically updated?