Perf tips for using the N-Gram service with WCF
The support in Visual Studio for WCF makes writing a SOAP/XML application for the Web N-Gram service a pretty straightforward process. If you're new to this, the Quick Start guide might be helpful to you. There are a few tweaks you can make, however, to improve the performance of your application if you intend to call the service repeatedly. Today we'll cover a few tips.
- Check your web proxy
Your web proxy can be a bottleneck, especially in a corporate environment. Some proxies will limit the number of calls an application can make in a given amount of time, and will return 5xx (HTTP) responses when you exceed this limit. Oftentimes it's difficult to tell whether the web service is returning the error as opposed to your proxy, so check the error message carefully. If there's an alternative proxy you can use, try that one instead. Here's how you'd modify your application configuration file:- Find your application configuration file. Generally, this file will be called
app.config
orweb.config
- Find the binding your endpoint is using under
<configuration><system.serviceModel><bindings><wsHttpBinding>
or equivalent - Set
useDefaultWebProy="false"
, and - Set
proxyAddress="https://your-proxy-name-and-maybe-port"
- Find your application configuration file. Generally, this file will be called
- Batch your calls
If your task involves getting a bunch of probability values, the use of the batch methods is strongly encouraged. This is partly due to the amortization of XML parsing, and also partly due to how lookups are distributed in the cluster. So instead of callingGetProbability
orGetConditionalProbability
, callGetProbabilities
orGetConditionalProbabilities
whenever possible. - Reuse your WCF proxy
There's no need to repeatedly create a client proxy for every call to the service. This is in fact a great overhead that's usually unnecessary. You might catch for faults (exceptions) and only recreate a client proxy in such an event. Bonus points if you have an exponential backoff during retries while dealing with network hiccups. - Choose models wisely It turns out that the larger the model, the longer the lookups take. Body, as you might guess, is the largest; anchor and bquery are next largest, and title is the smallest. You may also notice some performance difference between jun09 models and apr10 models due to some changes in how the data is hosted.
Contribute your own tips!