Importing data into Gemini
This time around we will take a closer look at ways of getting your data into Gemini model. In particular we will into importing data using “conventional” ways. Using variety of OLEDB/ODBC/.Net client libraries.
Choosing Data Source
To import your data click on the “From Database ” in the Gemini Client Window and enter Import Wizard.
Here you see long pre-populated list of data sources you can import from. If you don’t see your favorite data source, you can click choose “Others” and Import Wizard will let you build your own OLEDB connection string.
Click on the “Build” button in the “Specify a Connection String” page and Import Wizard would launch Data Link Properties dialog. Click on the “Provider” tab in the dialog and you will see even longer list of OLEDB providers registered on your machine:
You can then choose any of the providers, but for the interest sake choose a “Microsoft OLE DB Provider for ODBC Drivers”.
Using this option you could then import data into Gemini model using any of the ODBC drivers you have installed on your machine.
But what about .Net providers? There isn’t a link or any other high level selection you make. The choice for particular provider type is accessible as advanced option once you’ve selected the type of data source.
Let’s go back to the first page of the Import wizard and choose “Microsoft SQL Server” as a source. Click next and in the next page click on the “Advanced” button right below the Database name.
The Advanced properties dialog is available for you to control fine details of your connection. You can pick and choose particular provider type from the Providers drop down box. You can change any of the provider properties in the properties window below. You can also see the complete connection string used to establish connection in the box below. If you see empty property box like this:
This means that you don’t have particular provider installed. Go back and install the provider you would like to use.
Choosing data to import
Once you’ve selected a datasource you would see a choice to import data by ether selecting list of tables or by writing your own SQL query.
In CTP2 Gemini Import Wizard does not give you a SQL Query builder. You simply get a text box where you can either write or paste a query you’ve constructed elsewhere. Gemini Import Wizard will try to validate the query and will not let you proceed if you have a syntax error.
If you choose to go with Tables selection, you would see a “Select Tables and Views” page
Here you can pick one or many tables. “Select Related Tables” will cause Gemini Import Wizard to select all tables related to the once you’ve already selected if this information is present in the data source. Import Wizard will also remember table relationships and create these between tables in the Gemini model.
You can also preview, explicitly select the columns, and filter content of particular table
Don’t worry if you don’t see entire content of your table- Preview dialog is only showing the first 50 rows using something like “Select Top 50 <columns> from <table>”.
This dialog also lets you to delete any columns you don’t need and set filters to limit amount of data you’re bringing into Gemini.
It is very good idea to restrict the import to just the data you need. Large number of unnecessary columns or rows will take up unnecessary space in the Gemini model.
Once you’ve made your choice, the last page – Import Summary, lets you review your choices. Just in case you forgot something and need to go back before import starts.
Clicking on the finish button would bring up the Progress Dialog and would start actual import of the data.
Here you see number of rows imported per table and if any errors occurred during the import you could see an error message by following a link that appears in the Message column. All tables are imported serially and speed of data import largely depends on the ability of your data source to deliver the data. While importing large tables you would see progress showing count of rows imported (in increments of 10,000).
The last row in the progress report – “Data preparation” is the stage where Gemini creates relationships between tables.
If you forgot something or import takes too long, you can cancel import. In this case all tables that show “Success” will be imported and any following tables would show error. Also the ability of Gemini import wizard to cancel import operation might be dependent on ability of the relational backend to cancel the query Gemini sends to it. Imagine you’ve constructed complex SQL query to bring data into Gemini and it takes awhile for relational database to process it. If you try and cancel out of import operation, the Gemini Import Wizard would forward cancel request to the relational database but that might take longer to respond, keeping you waiting.
Conclusion.
We discussed various details on how you can import data from various data sources into the Gemini model. If you’ve worked with Excel or other data modeling tools – data import in Gemini should be quite intuitive. Some of the concepts of the importing data are still little rough. But this is the first CTP and we will try to improve on data import experiences further. In the meantime, if you have any specific data import questions or observations, please give us feedback on the Gemini newsgroups.
Comments
- Anonymous
February 25, 2010
I am not a genius but the one thing i noticed working in excel 2003 was that after 65,000 rows thats it now you go to next sheet. This issue of 65,000 rows was addressed to ten lakhs rows in Excel 2007. O.K. but supposed is have lot of excel data on different sheets O.K. can't i just cut and paste here one behind the other and use the filter to know the my results, i.e. why do we have to impose the restriction on the number of rows in excel. - Anonymous
April 26, 2010
The row resriction in Excel 2007 is 1 million. I'm sure Excel 2010 supports at least that many. However, PowerPivot does not have the same restriction. Think of it as a separate application within Excel. - Anonymous
May 06, 2010
Hi there,I am trying to import data with a OLEDB provider. This is a own developed OLEDB Provider for SAP tables.It seems that your data import do not work with all OLEDB data types.It seems that your import wizard is not working withDBTYPE_NUMERIC or DBTYPE_DBTIMESTAMP.Or are there any restrictions?Where can I find these?Thanks,Andi - Anonymous
May 10, 2010
Andi, check this out to connect PowerPivot and SAP:http://blogs.msdn.com/powerpivot/archive/2009/09/09/importing-data-into-gemini.aspx - Anonymous
May 25, 2010
Krishtan,the link does not show any additional info.By the way, we have researched a lot and there is definitely a bug inside PowerPivot concenring OLEDB providers and date columns.see: social.msdn.microsoft.com/.../4d87e007-ae98-41a0-b980-b39b23c57ba6Date columns work only at the end of the selection list.Hope this gets fixed till the final release.Andi