Parallel DBCC CheckDB using GOlang
Parallel DBCC CheckDB in GOlang
In this post I will show you how to use the concurrent features of the GO (ie GOlang) language in order to perform some maintenance tasks in parallel. This is just a showcase of features so don't expect a enteprise-ready functionality . The problem we address in this case is the repetitive task of checking our databases for consistency. Internet is full of great quality scripts to address the issue. I will only show an alternative way of doing it as an excuse to show how easy is to reach SQL Server from GO and how to use its excellent concurrent features.
Setup
First of all, you need to install GO in your environment. Please go here https://golang.org/ and install the required binaries. Next you need to set one very important environmental variable called GOPATH: it will store all your source code (well, that's note exactly true but it's enough for here). Select a convenient folder for it (for example C:\dev\go), open the system settings panel and select Advanced system settings:
From here, click Environmental variables:
And add your GOPATH variable, either to your account (better) or system-wide.
To check your configuration just open a command prompt and issue go version. If everything is ok you should see the version installed:
Just to check, type also ECHO %GOPATH% you should see your previously selected folder.
Querying SQL Server
In order to connect to SQL Server there are many options available. The most straightforward way is to use the ODBC driver, but for the sake of this post we will use the excellent open source implementation by denisenkom. You can find it here: https://github.com/denisenkom/go-mssqldb. Wait, you don't need to clone his repository yourself. All you have to do is issue go get "github.com/denisenkom/go-mssqldb" and the go runtime will clone the package for you and install it in your GOPATH folder.
GOlang drivers allow you to code without tying yourself to a specific implementation. You can switch to another driver simply changing the driver signature. Please note that Microsoft does not support these open source drivers: you will need to contact the authors for support.
Open connection
Let's see how it's done:
Here the sql package is the standard package database/sql. You are telling it to use the "mssql" driver which has injected itself simply by anonymous import:
Query preparation
Once we have a valid connection (ie without err) you must prepare your statement using the Prepare method of the receiver *sql.DB:
Here we are simply retrieving the database list - projecting only two fields. If we were to specify parameters we would use the ODBC syntax typing a question mark. For example:
Here the question mark is our parameter. You don't need to specify the contents right now as you will be doing it in the next step.
Query execution
Now we can call the Query() method. This method is peculiar because can accept and arbitrary number of parameters. As you might have guessed, these are the query parameters defined before. So in the case of SELECT database_id, name FROM sys.databases (no parameters) all we have to do is:
To pass a parameter all you have to do is:
Data retrieval
This function returns the row collection as first parameter. All we have to do is to enumerate it and extract the data calling the Scan method of *sql.Rows:
You need to supply to the exact number of fields in the Scan method as parameters. If you don't know in advance how many there will be you can look at the rows.Columns() getter (see https://golang.org/pkg/database/sql/#Rows.Columns for more details) which will give you the fields in the recordset along with their names. In my case I know them in advance so I've prepared a struct to hold them:
Note: DBCC CHECKDB WITH TABLERESULTS is undocumented, unsupported and different between SQL Server versions. It's shown here just as an example.
The Scan code becomes:
Worth noting is the use of strings. GO does not allow nil strings. The default string is an empty string. In SQL Server there is a huge differenct between an empty string an NULL: you can test it in SQL Server Management Studio executing:
Which will give you different .
So in order to tell this difference you have to pass to Scan a pointer to a pointer to a string (that is *(*string)) so in case of NULL the pointer will point to nil, in case of an empty string the pointer will point to "". Don't worry, you cannot forget this because you will get a runtime error if your pass a simple pointer to string (that is *string).
defer
You might have spotted in the previous snippets the defer statement. It basically tells the compiler to call your function as soon as the exit of the scope. The defer is smart enough to be LIFO so if you do:
- open DB
- defer close DB
- open query
- defer close query
- open row
- defer close row
You are actually telling this:
- open DB
- open query
- open row
- close row
- close query
- close DB
That is what we want. The defer statement is handy because allows you to logically link the open and close operations without scattering them in your code in reverse order.
go functions
This special statement starts a concurrent function. The function can be anything and will be scheduled by the runtime without you having to do it manually. This is excellent for long running blocking operations such as database calls. So suppose we have our long running function; we can do:
Two things happen here. First we are declaring an anonymous function that requires a string parameter called connectionString. This function just calls the ProcessInstance function so it's very simple. Then we are calling go function(connectionString) for each connection string in our slice. Now, depending on the GO configuration, the machine cores and so on the functions will be processed concurrently. But wait, if the functions will go on their own, how can we be informed when they end?
There are may ways to do that, we can create mutexes (ok, but hard to maintain), shared objects (don't do that! ) or use channels.
GO Channels
Go channels are a thread safe data sinks. It other words you can send a receive data in different threads (or, better, go functions) without fear of concurrent access.
GO channels are blocking so if you try to get an item from an empty channel you will be blocked until it becomes available (you can override this behaviour, of course, but is not needed now).
We can ask each go func to put an item in the channel as soon as it is finished. Since we know how many go func we have started we know how many items we must receive.
The cEnd := make(chan int) creates the channel (empty). The function will add an item when it's done (the syntax is cEnd <- <item>). Notice how the go func can access the channel even if it's declared outside its scope and not passed as parameter. This is possible since this functions are closures.
Now from outside we just need to get from the channel a number of items equal the number of go funcs started:
Since we don't care of what value we get from the channel we just discard it.
Putting al together
With these notions it's easy to create our parallel check db script. You can get the code from this gist:
What we do is:
1. For each command line parameter start a go func
2. For each go func:
3. Open the connection
4. Get the database list
5. For each database:
6. Perform DBCC CHECKDB
This will execute steps 3-6 in parallel (that's what we wanted in the first place).
This is a sample execution with 3 servers as parameter:
Notice how the different executions are interleaved. Each server processes the entries without waiting for the others (as expected).
Going on
Well as you have guessed we still lack the proper interpretation of the CheckDB response. Right now we just tell the user that the CheckDB has started and has completed but we should also tell him if his databases are corrupt or not . I'll leave to you this task with a hint. There are two very useful packages you might want to import: strings and regexp.
Happy coding,
Francesco Cogno
Comments
- Anonymous
September 28, 2015
Very cool. Thanks for sharing.