Condividi tramite


Too many tiers? Writing efficient multi-tier web applications.

A while ago, we were all convinced that dividing our applications up into multiple tiers was the way to go. This was great because it allowed you to scale up the parts of your application that might represent a bottleneck, and have more control over what resources were allocated to what tier. Windows DNA was built on this concept, as well as the whole “Duwamish” architecture. The problem is, this introduces the possibility of a very “sloppy” implementation of this ideal that creates a much larger bottleneck. This is especially true when using slower mechanisms to transport data across tiers.

Suppose you have a SQL Server backend that pretty much spends its time running stored procedures and returning back datasets. You’ve written a stored proc for anything anyone might want to do. You have a middleware component that connects to your SQL backend, and this provides a programmable API to access your SQL data via web services. This abstracts your actual data store from anyone who wants to program against it, and third parties can simply call into managed APIs to get the data they need. No one needs to run some SQL statement or stored proc that might change in the next version. There are no SQL injection attacks, and other problems with front ends running SQL commands. Life is great.

Then, you have a front end written in ASP.NET. This front end calls into your web services to get the data it needs. The problem comes along when the front end (web application) is too tightly coupled with the business layer (web services). The web application might call several of these web methods during various stages of rendering a page. It might call some APIs to validate the user’s credentials, and then call a web method that returns a dataset to bind to a control, also get some menu data, etc. For each of these web service calls, a stored procedure is run. Now, we pretty much have a 1 to 1 mapping between a web service call and stored procedure call. For large data sets, a high percentage (like 40-50% of the page rendering time is simply spent deserializing these datasets across the wire. This is especially inefficient when the web server is running on the same physical machine as the business layer. ASP.NET will open up a socket, connect to IIS, “POST” in a SOAP envelope, get the results, deserializer this back into a DataSet, then return a reference to this object. While this is happening, the thread serving your web page request is blocked waiting for this result (unless you’re using synchronous web service calls, but that’s a huge challenge to design properly.) Yes, it’s true “remoting” can solve some of these problems, as well as Indigo. If you have more users hitting your site than you have available threads, this can cause all sorts of perf problems. Mean while, until the next web service call comes along, your business layer machine is sitting around twiddling its thumbs.

When this architecture was designed, the ideal situation was the higher tiers (meaning the data layer and business layer) were also the lesser used tiers. The business layer would load in all the data it needed to run, cache it and providing an in-memory representation of the data to work with. At an appropriate time, the data could be written back to the database. The presentation layer would request the data it needed from the business layer and then render it into a form suitable to display to the user. In other words, calling into your business layer wouldn’t necessarily mean a call into your data layer unless new data was needed.

For load balancing reasons, everyone seems to want their middleware components to be stateless and not have to worry about caching data and data being out of sync with the database.

Using proper abstraction means that a single call doesn’t drill down into 80 different layers every time it’s called, otherwise why even have this abstraction? The same works for “abstracting” your application into logical layers. If every call is going to drill down into every single layer every time, why have the layers?

I think a better approach for certain applications, especially those who have very simple business logic (ie: just grab some data from SQL Server) is to write business objects as sort of a “data adapter.” My data adapter knows how to get my data from a SQL connection and provide an API for me to read and manipulate it. My data adapter should be in the form of a DLL that my web application links to. The DLL knows how to connect to the SQL Server that it’s configured to use. Multiple instances of my web server, each with their “data adapters” connecting to a cluster of SQL back-ends, can be running to scale up to the demands of my users. This is still a layer of abstraction, but doesn’t have a massive bottleneck marshalling the data across the wire between tiers. If my front end is super dependent on this data access API, each call should be as lightning fast as I can make it, and that means running it in the same process so I can just return a live pointer to my object.

I could also write a separate web service that also links to my “data adapter” DLL, but provides a SOAP based interface to my API. My application doesn’t need to use this, but third parties that wish to program against my backend can easily use this interface. There’s really no reason for my own application to be accessing my own business layer via SOAP, it’s inefficient and results in blocking threads and dragging down the CPU. The amount of effort it takes to marshal a large chunk of data across SOAP is simply not worth the extra scalability that I’m not convinced I’d really have anyway. In fact, connecting two tightly coupled tiers that are both owned by me and controlled entirely by me should not be talking in an extensible human readable, 7bit protocol – especially if I’m doing this 30 times for every page I serve up.

If you’re building such an application, I’d be interested in hearing your comments as far as architectural decisions and scalability, and if you’ve done any stress testing with large amounts of data, where your bottlenecks are. It’s my opinion that flattening out these tiers into a single process running ASP.NET code and an adapter that can connect to a data backend is the way to go, especially when using SOAP. I’d love to hear your comments!

Mike

Comments

  • Anonymous
    April 06, 2006
    Mike, I totally agree with you that a tier should not be unnecessarily created, and that protocols are commonly the problem.

    It's not always the part of the protocol you think, though.  I have a pet peeve I want to air since the topic came up - 4k and 8k write buffers used in network communicaitons.

    The problem is that ethernet, for physical and sometimes historical purposes has maximum packet sizes of 1514 bytes (plus some layer 2 pad) which translates to 1460 bytes of TCP payload.  So, when you're communicating over TCP on ethernet (which you almost always are), you are sending 1460 bytes of payload at a time.

    TCP has a 'feature' called the acknowledgment timer, which is designed to conserve network resources.  The acknowlegment timer delays acknowledgements for 200ms or until two full segments have arrived.  Let's unwind this puzzle:

    Your 8k buffer sends two packets of 1460 bytes, which get acknowledged immediately, now there are 5272 bytes left in the buffer.  You send two more, and they are acknowledged immediately (the round trip delay is minimal, this is ethernet).  You have 2352 bytes remaining.  Now, you send one 1460 byte packet, and one 892 byte packet.

    Ok, so TCP's delayed acknowledgment timer sits around for 200ms, waiting for another 568 bytes to come along.  Well, it never happens, because your sending buffer is empty.  You've just added 200ms to transfer each 8k!  And this is on a local area network.  I'm not certain, but I would bet that this same thing happens on a local socket connection.

    What's the solution?  There are several:
    1) Alter your buffering strategy - use a sliding window type buffer system (that's what tcp does).  Writing in blocks may be good for the disc, but it's horrible for the network.
    2) Change your buffer size from 8192 to 8760.  This won't effect your scalability much, but will boost your ethernet performance because you'll always send a complete frame at the end of your buffer.
    3) Turn off the delayed acknowledgement timer (this is a windows registry setting).
    4) Change the maximum segment size on your local network to a multiple of 512 bytes (1024 is a good number).  This is a handy method if you have a lot of hard coded 8k buffers in modules that are out of your control.

    To check to see if you are having this problem, download ethereal, and look for packets with 568 bytes of payload every sixth data packet (the total size will be around 622 bytes, depending on the padding reported).

  • Anonymous
    April 06, 2006
    The packet size to look for is 892 bytes of payload (946 bytes or so) - but it can vary depending on the specifics.  Look for 3 to 5 large packets, followed by a small packet, followed by a 200ms delay.  If you see that - you're hooked!

  • Anonymous
    April 06, 2006
    The comment has been removed

  • Anonymous
    April 06, 2006
    The comment has been removed

  • Anonymous
    April 28, 2006
    Isn't the fact that you are traversing the layers so many times also a problem? Its like writing cursor-based SQL code - you end up running too many SQL calls (one per item in the outside cursor)

    Having a Pres. Tier, a Middle Business Logic Tier and a Data Tier isn't a radically new infrastructure, its been around for years (IBM's CICS and IMS/DC, MS MTS, SAP R/2 and many others) - I'd argue that its the only way to build a truly scalable app as you want to be able to scale out the Pres. tier without slamming the data tier - you govern data tier load by restricting the # of middle tier servers...

  • Anonymous
    April 28, 2006
    The middleware<>SQL protocol is TDS (Tabular Data Stream) a protocol that was created by Sybase way back when...

  • Anonymous
    April 28, 2006
    "especially if I’m doing this 30 times for every page I serve up"

    One of the cardinal sins of writing a distributed system is being really chatty between the layers. In most cases for the initial page load you should be able to cut down the number of trips to the server to one by combining everything into one method call.

    Also bear in mind that serialization happens all over the place. If you have out of process session state or you use viewstate then you have to serialize any objects you store in either of those. Viewstate makes it even worse as any data that's serialized typically has to be encrypted too.

    So when you're looking at your serialization costs make sure you know where it comes from.

  • Anonymous
    June 23, 2006
     What I see no one discuss is that tiering has also disadvantages unrelated to serialization or network protocols.
     A colegue of mine, just having reading about tiers, created an application that has the interface, bussiness objects, database objects and the database as separate tiers. Of course it was my job to debug and scale that. I had to change/add stored procedures to the database, add/change db objects to use them, add/change business objects to use those, then actually code!
     My personal opinion is that tiering is useless unless you automate these chores. Right now I am using code generation to address this and everytime the code in the db changes, the code generation tools recreates the entire business+database layer without any intervention from my part.

  • Anonymous
    February 14, 2008
    Good article, lots of good information and arguments on multi-tier development. But I have to say that for a very large application where there are large number of developers both PC and mainframe a multi-tier approach works very well due to ability to componentized (both UI, business layer and data layer ), web services, development standards, having some developers build the UI, developer build the Business layer and developers code the Oracle and mainframe data access layer. This provides for a fast development cycle. We do have a layered system and what you described (serialization and so on) does happen but the performance of our application is very fast. We could gain more performance if we change things to what you suggest but we could loose the benefits that I described above and so far our scalability and performance has been excellent. I worked as an enterprise architect for a large hospital with 8000+ users and four hospitals and 30 satellites clinics. We have a multi-tier application portal (built-in-house) that consist of a collection of application both client-server and asp.net application. These application use web services and for data storage we use Oracle and Mainframe. In summary our multi-tier layered application has been working for us really well, performance and development wise…. Thanks.

  • Anonymous
    August 10, 2008
    hi mike, can u plz tell me if u r choosing asp.net as ur front end and sql server as ur backend than what technology will b used as a middle ware? reply soon

  • Anonymous
    May 28, 2009
    PingBack from http://paidsurveyshub.info/story.php?title=mike-christensen-web-dev-guy-too-many-tiers-writing-efficient-multi