Connecting to a PostgreSQL Instance Using SQL Server Integration Services

Slalom Consultant David Van De Somple

Slalom Consultant David Van De Sompele's expertise includes performance tuning, production DBA activities, data modelling, ETL processes and reporting. He's based in Slalom's Seattle office.

About two years ago my current client tasked me with building a SQL Server reporting instance complete with a set of SSIS (SQL Server Integration Services) ETL processes that would connect to a production PostgreSQL database and retrieve the previous day’s data.  To be honest, this was the first time in my eighteen year career that I had seen a PostgreSQL database. And as I was about discover, connecting to it with Integration Services for ETL was not a straight-forward task and, if my Google searches were any barometer, not a very common one.

PostgreSQL is an open source relational database management system.  Its progenitor, Ingres, was developed in the mid 1980’s and continues today as another open source RDMS.  While completing a business degree in 1990 I cut my RDMS teeth using Ingres on an HP Vectra 286–my rather plebian 8GB iPhone 3G would run circles around this machine. Visiting the PostgreSQL web site you will find the following quote on the home page: “The world’s most advanced open source database.”  In many respects I can agree with that statement.  But outside connectivity is one area of PostgreSQL that in this author’s eyes has fallen woefully short.  That is not to say it is impossible to make a connection from SQL Server to PostgreSQL, but it certainly isn’t easy; the only option is using the PostgreSQL ODBC driver and precious little documentation is included with it.

When introduced in the early 1990s, ODBC was a great method for connecting one type of RDMS to another.  But by today’s standards, ODBC is slow and primitive and most vendors have moved on to faster and more efficient connection methods.  And therein lies the problem with the PostgreSQL ODBC driver, it is slow and primitive and until about 6 month ago, was only available in a 32 bit version.

The 32-bit PostgreSQL ODBC driver presents a major issue if you are using a full 64-bit architecture for your SQL Server instance.  The problem is that when you use SQL Agent to schedule a SSIS package, it is calling the 64-bit version of ODBC and if there is no 64-bit ODBC driver and data source available, the package will fail.  A colleague of mine suggested that I try and force SQL Agent to the call the 32-bit ODBC driver at run time.  Despite my best efforts I was unsuccessful in getting this to work.  Setting up a separate 32 bit SQL instance to broker the job and integration package was the only workable solution I could find.   So imagine my delight last month when I discovered that the PostGRES development community finally released a reliable 64-bit ODBC driver.

Installation of both drivers (32-bit & 64-bit) is identical.  Here are the basic steps to get the driver working:

  1. Download the latest .msi file for Windows from this location: http://www.postgresql.org/ftp/odbc/versions/msi/
  2. Run the msi file on your SQL Server.
  3. Launch the ODBC Administrator Utility and choose the type of data source you need; File, System or User and click Add.
  4. The ODBC Administrator will present a list of drivers.  Scroll to the bottom and you will see two options for PostreSQL; ANSI and Unicode.  Select the version you need and click Finish.
  5. The Administrator will present a screen on which you must supply a database name, server name, user name and password.
  6. After you have supplied values for these fields, click the Datasource button and make sure the Use Declare/Fetch box is checked.  The driver will fail to retrieve larger datasets if you do not check this box.  I have not yet found a satisfactory answer for why this is so.

Now you are ready to build a new connection manager in SSIS and hook it to the PostGRES data source you just created.  Use the following settings when building out the connection manager:

  1. Select the .Net Providers\ODBC Data Provider.
  2. Select the “Use connection string” radio button.  Using the values you configured in the ODBC Administrator, build a connection string as follows:  Dsn=PostgreSQL35W;uid=User1
  3. Enter the User name and Password in the fields provided.
  4. Test the connection and you should be ready to go.

Installing and configuring the 64-bit driver might be identical to the 32-bit version.  But as I quickly discovered, calling PostgreSQL queries with the 64-bit driver is NOT identical to the 32-bit version.  Consider this basic SELECT statement which caused no problems when called using the 32-bit ODBC driver:

SELECT InvId, CustId, PrevBalance, Amount, Tax, Total, NewBalance, DateDue, DatePaid, Paid_p, InvNumber, InvDdate, CreateDate

FROM tblInvoice

When called from the 64-bit driver, this very same query failed with the following error:

Description: The data type of “output column “Paid_p” (795)” does not match the data type “System.String” of the source column “Paid_p”

What makes the column “Paid_p” special enough to throw this error in Integration Services?  Some investigation revealed that this column was defined in the PostgreSQL schema as a Boolean; SQL Server does not have a formal Boolean data type.  At this point some of you are probably thinking, “But wait, SSIS supports a Boolean data type. Why not just import it as Boolean and then use the Data Conversion object?”  That was the first thing I tried, but was not successful.  Here is why: It appears the PostGRES 32-bit driver implicitly converts the Boolean data type to a Unicode text data type (NVARCHAR) before handing the data to Integration Services, and SSIS is perfectly happy with this. But from the error thrown by SSIS when using the 64-bit driver, it was clear this version handles Boolean data in a different manner from the 32-bit version.

After some basic testing, I could infer that the 64-bit driver does not implicitly convert Boolean data, attempting instead to maintain the original data type.  But from the error message it was evident that SSIS was unable to interpret or convert the Boolean data it was receiving from the driver. That meant I had to convert the data at the source PostgreSQL SELECT statement so when it arrived at Integration Services, it was already in a format SSIS could interpret.  I found two successful methods of translating Boolean data and the one you use will depend on what you want your Boolean data to look like in SQL Server.

  • This first example will return a literal “True” or “False”.

SELECT CAST (Paid_p AS TEXT)

FROM tblInvoice

  • This second example will return literal “1” or “0”.

SELECT      CASE

WHEN Paid_p IS TRUE

THEN  1

ELSE 0

END AS Paid_p

FROM tblInvoice

Having solved this issue with Boolean data conversion, I now have the 64-bit driver deployed and running correctly in a production environment.

Working with PostgreSQL from SQL Server Integration Services is not impossible, but it does present a few challenges.  My own experience leads me to believe this is something not often done in production environments, which is why documentation is somewhat scarce. The release of a reliable 64-bit ODBC driver increases the chances of success.  But be sure to pack your patience, testing and experimentation skills.

subscribe by emailSubscribe to be emailed about new Business Intelligence posts.

About David Van De Sompele
Slalom Consultant David Van De Sompele's expertise includes performance tuning, production DBA activities, data modelling, ETL processes and reporting. He's based in Slalom's Seattle office.

32 Responses to Connecting to a PostgreSQL Instance Using SQL Server Integration Services

  1. Pingback: Open Source Business Intelligence on the Cloud « DECISION STATS

  2. I am in accordance completely

  3. All round good blog post!

  4. JingJaiShop says:

    An all round amazingly written blog post!!!

  5. doitright says:

    thanks for sharing

  6. alan says:

    This is exactly what I needed. Like you, I’ve been a developer for 15 years – Oracle and SQL Server, and have never touched a postgres db…

    • Gilles says:

      Hi
      Thx for your post
      Have you got any idea about how to set up the format date…
      In using ADO.net (ODBC Data Provider) under SSIS
      I can’t select the data field because it makes no difference between 13/12/12 and 12/13/2012 :
      I ve got the message : ERROR 22008 date/time field value out of range
      Thx for your help
      Gil

      • Hi Gil,

        You may need to format your date values on the PostGRES side-using the PostGRES SQL language-so that when the data reaches SSIS it is already in a date format that SQL Server understands.

        Another approach you may consider: On the SSIS side declare the incoming date value as a varchar or string. Once you have it as a varchar in SSIS it should be easy to extract the date parts and re-assemble them as a proper date and then do an explicit conversion to datetime.

        Hope that helps,
        David 🙂

  7. Billy says:

    Hi David,

    thanks for your post. I’ve been trying to push data between SQL Server 2012 x64 and Postgres 9.1 using psqlODBC (32 and 64 bit versions) recently.

    I can create a linked server that connects to Postgres and I can browse the data via that but when I try to use an ODBC connection in SSIS or the SSIS Import export wizard it fails.

    The error from the 32 and 64 SQL Server Import and Export Wizard is as follows:

    —- Error ——-
    Column information for the source and the destination data could not be retrieved, or the data types of source columns were not mapped correctly to those available on the destination provider.

    “atable” -> [dbo].[atable]:

    – Cannot load the column attributes from the schema using the current provider descriptor:

    The column attribute “COLUMN_SIZE” is not valid.
    The column attribute “DECIMAL_DIGITS” is not valid.
    The column attribute “COLUMN_SIZE” is not valid.

    Please verify the ProviderDescriptors.xml file contains the data provider descriptor that matches the schema of the used data provider.
    —————–

    If I execute a package in SSIS, it errors immediately with:

    [ODBC Source [2]] Error: The AcquireConnection method call to the connection manager S32PostgreSQL35W.postgres 1 failed with error code 0xC0014009. There may be error messages posted before this with more information on why the AcquireConnection method call failed.

    I would appreciate any suggestions.

    Thanks

    Billy

    • Martina says:

      Hi Billy,
      I had the same error about the AcquireConnection method failing in SSIS. It looks like you are using the 32bit driver based on your connection naming convention. When I changed the Project Properties – Debugging – Run64BitRuntime property to False it worked like a charm. This forced Visual Studio to debug in 32-bit mode.
      Cheers,
      Martina

  8. DKS says:

    Hi David,

    I need to connect to MS SQL SERVER database via PostgresSQL 9.1 . I have installed it with after invoking init.exe
    “D:\POSTGRE\pgsql\bin\initdb” -D D:\POSTGRE\pgsql/data

    Now It will be great if you give me direction how to achive the same.

    Warm Regards,
    Durgesh

  9. Pingback: SSIS: Connect to PostgreSQL | The Data Queen

  10. Rod Merritt says:

    Excellent help, just what I needed. Thanks!

  11. Rod Merritt says:

    Well written article to be sure. Thanks. However I wonder if you can help me figure out an error message. When at the step to test the data source in connection manager, I get the following error:

    TITLE: Connection Manager
    Test connection failed because of an error in initializing provider. ERROR [IM014] [Microsoft][ODBC Driver Manager] The specified DSN contains an architecture mismatch between the Driver and Application

    I am on a 64 bit box using Win 7 and the 64 bit ODBC driver and the postgres db server is 64.

    • Rod Merritt says:

      Wow, three people looking into this here and still no luck. SSIS will connect enough to create a destination table on postgres, but when I run the package, I still get the error message.

      • Rod, do you have both versions (32 bit and 64 bit) of the Postgres ODBC driver installed on your server; the one running SSIS? This is often necessary in addition to creating an identically named Postgres data source in both versions of the ODBC admin tool.

        Sent from my iPhone

  12. Rod Merritt says:

    Apparently SSIS is still 32 bit app and setting the run64bitruntime to false didn’t help. So I installed the 32 bit ODBC driver and created a DSN there and called it from SSIS succesfully. So I don’t know how you used the 64 bit ODBC driver though.

    • Rod,

      This is versioning issue. Specifically, some parts of SQL Server, at least 2005; the version I was using, and SSIS are 64 bit and other parts are 32 bit. So for example, if your server is running 64 bit OS and 64 bit SQL Server, then at runtime-executing your SSIS package as a scheduled job-SQL will call the 64 bit ODBC driver. At development time, BIDS (Visual Studio) is still a 32 bit application and will call the 32 bit ODBC driver. This is why ultimately, you need both 32 bit and 64 bit ODBC drivers installed.

      Hope that helps.
      David 🙂

      • Rod Merritt says:

        Thanks David! I have both 32 and 64 bit drivers, tried calling them both from SSIS, changed the 64 bit otion in SSIS and tried both again. Some combination (I forget) would allow me to create a destination table in Postgres and see the mapping afterwards, but when executing the package it would bomb out with an ‘architecture different’ error. This situation took so much time, I downloaded Pentaho and used the jdbc connection to Postgres with no problems.

  13. Brian says:

    I have a dumb question but I can’t figure it on my own. I have MS SQL installed, but I can’t find anything called connection manager. I don’t know how to do this step:
    “Now you are ready to build a new connection manager in SSIS and hook it to the PostGRES data source you just created. Use the following settings when building out the connection manager”

  14. Gunjan Juyal says:

    A very well written article, clearly explaining the context/environment and the exact issue. It helped me figure out and solve an issue I’ve been struggling with since the last 3 days, thanks a lot!

    I work with one of the largest domain registrars in the world, and our primary database is Postgres (it’s not that rare after all!). We have a MS SQL Server Db that is used for reporting and OLAP, and there are several data warehousing jobs that sync the data between the two, using the Postgres ODBC driver to communicate.

    Currently we’re in the process of migrating from our old SQL Server 2005 Db to a new machine with SQL Server 2008, while our data warehousing jobs have been created in DTS (SQL Server 2000). I was unable to get the data copy jobs in the new machine to work properly, and they broke for the larger tables giving an “insufficient memory to complete operation” or “Out of memory while reading tuples” error, even though sufficient physical memory was available. All efforts to locate the issue in DTS or the SQL Server went in vain.

    The fix: While creating the ODBC data source you need to check the “Use Declare/Fetch” option, as mentioned in this article (thanks again for the tip!).
    I did some more research on this option to figure out the exact cause of the issue. “Use Declare/Fetch” opens a cursor to the PG database and automatically maintains a cache of 100 rows. This is very useful if all you need to do is read from the source and not update. When this option is not checked then a cursor will not be used and the driver will fetch all records in memory, which was the issue in our case. Am still not sure though why the process didn’t use more than ~700Mb memory even though enough physical RAM was available.
    Here’s an article describing the driver config options in more detail: http://psqlodbc.projects.postgresql.org/config.html

  15. Jamie Irwin says:

    Your blog post was very helpful.

    I too am having a problem using the 64bit ODBC driver. After installing the latest driver( v9.01.02.00 from 8/20/2012) and attempting to create a new user or system data source The ODBC Administrator crashes.

    I’m running it on Windows Server 2012 Standard.

    Any thoughts?

    Thanks,

    Jamie Irwin

    • Jamie,

      I have never used the PostGRES ODBC drivers on Windows Server 2012. The servers on which I’m using the PostGRES drivers are all running Windows Server 2008 Enterprise version. So it is possible you are seeing an incompatibility between the PostGRES driver and Windows 2012. That is my initial thought. Have you googled to see if anyone else is successfully using the PostGRES drivers on Windows 2012? Best David 🙂

  16. Vincent says:

    This blog is very helpful.

  17. Pingback: PostgreSQL en Integration Services. Parte I: Instalación del driver | Juan Esteban Quiroz

  18. Kapil says:

    This is a great article and I came across when i’m tasked with connecting PostgreSQL server to SharePoint 2010. In order to utilize Business Connectivity Services in SharePoint I need information. Do you have any pointers how to achieve the same? Basically I’m thinking on the lines if I can create a SQL database and map it with PostgreSQL database and use this SQL db to connect to SharePoint.

  19. eviriyanti says:

    Thanks for sharing, it really help me.

  20. leesa says:

    somebody help me, i dont know how to integrate two different database that is postgres n sql server 2008

  21. zaheer says:

    I have found this informaiton very helpful to accomplish my ETL project started few months back. I could never connect my package using 64 bit version then I had to configure 32 bit driver using Configuration of 64 bit ODBC driver for PostgreSQL on Server C:\Windows\SysWOW64\odbccad32.exe and made couple of change in my package to run it as 32 bit application. For data type limitation, I found a work around using PostgreSQL built in date functions at source which worked very well.

  22. vil coyote says:

    hello
    please look at this site for your problem, you must install an OLE BD provider for PostgreSQL :

    http://www.mssqltips.com/sqlservertip/2619/export-data-from-postgres-to-sql-server-using-ssis/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: