Udostępnij za pośrednictwem


Chapter 8: Data Layer Guidelines

For more details of the topics covered in this guide, see Contents of the Guide.

Contents

  • Overview
  • General Design Considerations
  • Specific Design Issues
  • Technology Considerations
  • Performance Considerations
  • Security Considerations
  • Deployment Considerations
  • Design Steps for the Data Layer
  • Relevant Design Patterns
  • Additional Resources

Overview

This chapter describes the key guidelines for designing the data layer of an application. It will help you to understand how the data layer fits into the typical layered application architecture, the components it usually contains, and the key issues you face when designing the data layer. You will see guidelines for design, the recommended design steps, relevant design patterns, and technology options. Figure 1 shows how the data layer fits into a typical application architecture.

Ee658127.14dc2cad-2895-4a24-8e47-280db5482fef(en-us,PandP.10).png

Figure 1

A typical application showing the data layer and the components it may contain

The data layer may include the following:

  • Data Access components. These components abstract the logic required to access the underlying data stores. They centralize common data access functionality in order to make the application easier to configure and maintain. Some data access frameworks may require the developer to identify and implement common data access logic in separate reusable helper or utility data access components. Other data access frameworks, including many Object/Relational Mapping (O/RM) frameworks, implement such components automatically, reducing the amount of data access code that the developer must write.
  • Service agents. When a business component must access data provided by an external service, you might need to implement code to manage the semantics of communicating with that particular service. Service agents implement data access components that isolate the varying requirements for calling services from your application, and may provide additional services such as caching, offline support, and basic mapping between the format of the data exposed by the service and the format your application requires.

For more information about the components commonly used in the data layer, see Chapter 10 "Component Guidelines." For more information about creating data access components, see Chapter 15 "Designing Data Components."

General Design Considerations

Your data access layer must meet the requirements of your application, perform efficiently and securely, and be easy to maintain and extend as business requirements change. When designing the data layer, consider the following general design guidelines:

  • Choose an appropriate data access technology. The choice of data access technology depends on the type of data you must handle, and how you intent to manipulate that data within the application. Certain technologies are better suited to specific scenarios. The Appendix "Data Access Technology Matrix" at the end of this guide discusses these options and enumerates the benefits and considerations for each data access technology.
  • Use abstraction to implement a loosely coupled interface to the data access layer. This can be accomplished by defining interface components, such as a gateway with well-known inputs and outputs, which translate requests into a format understood by components within the layer. In addition, you can use interface types or abstract base classes to define a shared abstraction that must be implemented by interface components. For more information about layer abstraction, see Chapter 5 "Layered Application Guidelines."
  • Encapsulate data access functionality within the data access layer. The data access layer should hide the details of data source access. It should be responsible for managing connections, generating queries, and mapping application entities to data source structures. Consumers of the data access layer interact through abstract interfaces using application entities such as custom objects, TypedDataSets, and XML, and should have no knowledge of the internal details of the data access layer. Separating concerns in this way assists in application development and maintenance.
  • Decide how to map application entities to data source structures. The type of entity you use in your application is the main factor in deciding how to map those entities to data source structures. Common design approaches follow the Domain Model or Table Module patterns or use Object/Relational Mapping (O/RM) frameworks, though you may implement business entities using different formats. You must identify a strategy for populating the business entities or data structures from the data source and making them available to the business layer or presentation layer of the application. For more information about the Domain Model or Table Module patterns, see the section "Relevant Design Patterns" near the end of this chapter. For more information about business entities and data formats, see Chapter 13 "Designing Business Entities."
  • Consider consolidating data structures. If you are exposing data through services, consider using Data Transfer Objects (DTOs) to help you organize the data into unified structures. In addition, DTOs encourage coarse-grained operations while providing a structure that is designed to move data across different boundary layers. DTOs can also span business entities for aggregate operations. If you are using the Table Data Gateway or Active Record pattern, you may consider using a DataTable to represent the data.
  • Decide how you will manage connections. As a rule, the data access layer should create and manage all connections to all data sources required by the application. You must choose an appropriate method for storing and protecting connection information, perhaps by encrypting sections of the configuration file or limiting storage of configuration information to the server, in order to conform to corporate security requirements. For more information, see Chapter 15 "Designing Data Components."
  • Determine how you will handle data exceptions. The data access layer should catch and (at least initially) handle all exceptions associated with data sources and CRUD (Create, Read, Update, and Delete) operations. Exceptions concerning the data itself, and data source access and timeout errors, should be handled in this layer and passed to other layers only if the failures affect application responsiveness or functionality.
  • Consider security risks. The data access layer should protect against attacks that try to steal or corrupt data, and protect the mechanisms used to gain access to the data source. For example, sanitize error and exception information so that data source information is not revealed, and use least privilege accounts to restrict privileges to only those needed to perform the operations required by the application. Even if the data source itself has the ability to limit privileges, security should be implemented in the data access layer as well as in the data source. Database access should be through parameterized queries to prevent SQL injection attacks succeeding. Never use string concatenation to build dynamic queries from user input data.
  • Reduce round trips. Consider batching commands into a single database operation.
  • Consider performance and scalability objectives. Scalability and performance objectives for the data access layer should be taken into account during design. For example, when designing an Internet-based commerce application, data layer performance is likely to be a bottleneck for the application. When data layer performance is critical, use profiling to understand and then reduce or resolve expensive data operations.

Specific Design Issues

There are several common issues that you must consider as your develop your design. These issues can be categorized into specific areas of the design. The following sections provide guidelines for the common areas where mistakes are most often made:

  • Batching
  • Binary Large Objects (BLOBs)
  • Connections
  • Data Format
  • Exception Management
  • Object Relational Mapping
  • Queries
  • Stored Procedures
  • Stored Procedures vs. Dynamic SQL
  • Transactions
  • Validation
  • XML

Batching

Batching database commands can improve the performance of your data layer. Each request to the database execution environment incurs an overhead. Batching can reduce the total overhead by increasing throughput and decreasing latency. Batching similar queries can improve performance because the database caches and can reuse a query execution plan for a similar query. Consider the following guidelines when designing batching:

  • Consider using batched commands to reduce round trips to the database and minimize network traffic. However, for maximum benefit, only batch similar queries. Batching dissimilar or random queries does not provide the same level of reduction in overhead.
  • Consider using batched commands and a DataReader to load or copy multiple sets of data. However, when loading large volumes of file-based data into the database, consider using database bulk copy utilities instead.
  • Do not perform transactions on long-running batch commands that will lock database resources.

Binary Large Objects

When data is stored and retrieved as a single stream, it can be considered to be a binary large object, or BLOB. A BLOB may have structure within it, but that structure is not apparent to the database that stores it or the data layer that reads and writes it. Databases can store the BLOB data or can store pointers to them within the database. The BLOB data is usually stored in a file system if not stored directly in the database. BLOBs are typically used to store image data, but can also be used to store binary representations of objects. Consider the following guidelines when designing for BLOBs:

  • Consider whether you need to store BLOB data in a database. Modern databases are much better at handling BLOB data, providing you choose an appropriate column data type, and can provide maintainability, versioning, operations, and storage of related metadata. However, consider if it is more practical to store it on disk and store just a link to the data in the database.
  • Consider using BLOBs to simplify synchronization of large binary objects between servers.
  • Consider whether you will need to search the BLOB data. If so, create and populate other searchable database fields instead of parsing the BLOB data.
  • When retrieving the BLOB, cast it to the appropriate type for manipulation within your business or presentation layer.

Connections

Connections to data sources are a fundamental part of the data layer. All data source connections should be managed by the data layer. Creating and managing connections uses valuable resources in both the data layer and the data source. To maximize performance and security, consider the following guidelines when designing for data layer connections:

  • In general, open connections as late as possible and close them as early as possible. Never hold connections open for excessive periods.
  • Perform transactions through a single connection whenever possible.
  • Take advantage of connection pooling by using a trusted subsystem security model, and avoiding impersonation or the use of individual identities if possible.
  • For security reasons, avoid using a System or User Data Source Name (DSN) to store connection information.
  • Consider if you should design retry logic to manage the situation where the connection to the data source is lost or times out. However, if the underlying cause is something like a resource contention issue, retrying the operation may exacerbate the problem leading to scaling issues. See Chapter 15 "Designing Data Components" for more information.

Data Format

Choosing the appropriate data format provides interoperability with other applications, and facilitates serialized communications across different processes and physical machines. Data format and serialization are also important in order to allow the storage and retrieval of application state by the business layer. Consider the following guidelines when designing your data format:

  • Consider using XML for interoperability with other systems and platforms, or when working with data structures that can change over time.
  • Consider using DataSets for disconnected scenarios in simple CRUD-based applications.
  • If you must transfer data across physical boundaries, consider serialization and interoperability requirements. For example, consider how you will serialize custom business objects, how you will translate them into Data Transfer Objects (DTOs) where this is a requirement, and what formats the receiving layer can accept.

For more information on data formats, see Chapter 15 "Designing Data Components." For information on designing and using components in your application, see Chapter 10 "Component Guidelines."

Exception Management

Design a centralized exception management strategy so that exceptions are caught and thrown consistently in your data layer. If possible, centralize exception handling logic in components that implement crosscutting concerns in your application. Pay particular attention to exceptions that propagate through trust boundaries and to other layers or tiers. Design for unhandled exceptions so they do not result in application reliability issues or exposure of sensitive application information. Consider the following guidelines when designing your exception management strategy:

  • Identify the exceptions that should be caught and handled in the data access layer. For example, deadlocks and connection issues can often be resolved within the data layer. However, some exceptions, such as and concurrency violations, should be surfaced to the user for resolution.
  • Design an appropriate exception propagation strategy. For example, allow exceptions to propagate to boundary layers where they can be logged and transformed as necessary before passing them to the next layer. Consider including a context identifier so that related exceptions can be associated across layers when performing root cause analysis of errors and faults.
  • Consider implementing a retry process for operations where data source errors or timeouts occur, where it is safe to do so.
  • Ensure that you catch exceptions that will not be caught elsewhere (such as in a global error handler), and clean up resources and state after an exception occurs.
  • Design an appropriate logging and notification strategy for critical errors and exceptions that logs sufficient detail from exceptions and does not reveal sensitive information.

Object Relational Mapping

When designing an object oriented (OO) application, consider the impedance mismatch between the OO model and the relational model, and the factors that can make it difficult to translate between them. For example, encapsulation in OO designs, where fields are hidden, may contradict the public nature of properties in a database. Other examples of impedance mismatch include differences in the data types, structural differences, transactional differences, and differences in how data is manipulated. The two common ways to handle these mismatches are design patterns for data access such as Repository, and Object/Relational Mapping (O/RM) tools. A Domain Driven Design approach, which is based on modeling entities based on objects within a domain, is often an appropriate choice. For information about Domain Driven Design, see Chapter 3, "Architectural Patterns and Styles" and Chapter 13 "Designing Business Entities."

Consider the following guidelines when designing for object relational mapping:

  • Consider using a framework that provides an Object/Relational Mapping (O/RM) layer between domain entities and the database. Modern O/RM solutions are available that can significantly reduce the amount of custom code required.
  • If you are working in a greenfield environment, where you have full control over the database schema, you can use an O/RM tool to generate a schema to support the defined object model, and to provide a mapping between the database and domain entities.
  • If you are working in a brownfield environment, where you must work with an existing database schema, you can use an O/RM tool to help you to map between the domain model and the existing relational model.
  • If you are working with a smaller application or do not have access to O/RM tools, implement a common data access pattern such as Repository. With the Repository pattern, the repository objects allow you to treat domain entities as if they were located in memory.
  • When working with Web applications or services, group entities and support options that will partially load domain entities with only the required data—a process usually referred to as lazy loading. This allows applications to handle the higher user load required to support stateless operations, and limit the use of resources by avoiding holding initialized domain models for each user in memory.

Queries

Queries are the primary data manipulation operations for the data layer. They are the mechanism that translates requests from the application into CRUD actions on the database. As queries are so essential, they should be optimized to maximize database performance and throughput. Consider the following guidelines when using queries in your data layer:

  • Use parameterized SQL queries and typed parameters to mitigate security issues and reduce the chance of SQL injection attacks succeeding. Do not use string concatenation to build dynamic queries from user input data.
  • Consider using objects to build queries. For example, implement the Query Object pattern or use the parameterized query support provided by ADO.NET. Also consider optimizing the data schema in the database for query execution.
  • When building dynamic SQL queries, avoid mixing business processing logic with logic used to generate the SQL statement. Doing so can lead to code that is very difficult to maintain and debug.

Stored Procedures

In the past, stored procedures represented a performance improvement over dynamic SQL statements. However, with modern database engines, the performance of stored procedures and dynamic SQL statements (using parameterized queries) are generally similar. When considering the use of stored procedures, the primary factors are abstraction, maintainability, and your environment. This section contains guidelines to help you design your application when using stored procedures. For guidance on choosing between using stored procedures and dynamic SQL statements, see the section that follows.

In terms of security and performance for stored procedures, the primary guidelines are to use typed parameters and avoid dynamic SQL within the stored procedure. Parameters are one of the factors that influence the use of cached query plans instead of rebuilding the query plan from scratch. When parameter types and the number of parameters change, new query execution plans are generated, which can reduce performance. Consider the following guidelines when designing stored procedures:

  • Use typed parameters as input values to the procedure and output parameters to return single values. Consider using XML parameters or table-value parameters for passing lists or tabular data. Do not format data for display in stored procedures; instead, return the appropriate types and perform formatting in the presentation layer.
  • Use parameter or database variables if it is necessary to generate dynamic SQL within a stored procedure. However, bear in mind that using dynamic SQL in stored procedures can affect performance, security, and maintainability.
  • Avoid the creation of temporary tables while processing data. However, if temporary tables must be used, consider creating them in memory instead of on disk.
  • Implement appropriate error handling designs, and return errors that the application code can handle.

Stored Procedures vs. Dynamic SQL

The choice between stored procedures and dynamic SQL focuses primarily on the use of SQL statements dynamically generated in code instead of SQL implemented within a stored procedure in the database. When choosing between stored procedures and dynamic SQL, you must consider the abstraction requirements, maintainability, and environment constraints. In addition, in many cases, the choice between stored procedures and dynamic SQL queries includes developer preference or skill set.

The main advantages of stored procedures are that they provide an abstraction layer to the database, which can minimize the impact on application code when the database schema changes. Security is also easier to implement and manage because you can restrict access to everything except the stored procedure, and take advantage of fine-grained security features supported by most databases (though be aware that this may affect your ability to take advantage of connection pooling).

The main advantages of dynamic SQL statements are that they are often considered more flexible than stored procedures, and can enable more rapid development. Many Object/Relational Mapping (O/RM) frameworks generate dynamic queries for you, considerably reducing the amount of code developers must write.

Consider the following guidelines when choosing between stored procedures and dynamic SQL:

  • If you have a small application that has a single client and few business rules, dynamic SQL is often the best choice.
  • If you have a larger application that has multiple clients, consider how you can achieve the required abstraction. Decide where that abstraction should exist: at the database in the form of stored procedures, or in the data layer of your application in the form of data access patterns or O/RM products.
  • For data-intensive operations, stored procedures allow you to perform the operations closer to the data, which can improve performance.
  • To minimize application code changes when the database schema changes, you might consider using stored procedures to provide access to the database. This can help to isolate and minimize changes to application code during schema normalization or optimization. Changes to inputs and outputs of a stored procedure can affect application code, but these changes can often be isolated in specific components that access the stored procedure. Object/Relational Mapping (O/RM) frameworks can also help you to isolate and minimize application code changes when schemas are updated.
  • When considering dynamic SQL queries, you should understand the impact that changes to database schemas will have on your application. As a result, you should implement the data access layer in such a way that it decouples business components from the execution of database queries. Several patterns, such as Query Object and Repository, can be used to provide this separation. Object/Relational Mapping (O/RM) frameworks can help to achieve a clean separation between your business components and the execution of database queries.
  • Consider the team you have for development of the application. If you do not have a team that is familiar with database programming, consider tools or patterns that are more familiar to your development staff.
  • Consider debugging support. Dynamic SQL is easier for application developers to debug.

Transactions

A transaction is an exchange of sequential information and associated actions that are treated as an atomic unit in order to satisfy a request and ensure database integrity. A transaction is only considered complete if all information and actions are complete, and the associated database changes are made permanent. Transactions support undo (rollback) database actions following an error, which helps to preserve the integrity of data in the database.

It is important to identify the appropriate concurrency model and determine how you will manage transactions. You can choose between an optimistic model and a pessimistic model for concurrency. With optimistic concurrency, locks are not held on data and updates require code to check, usually against a timestamp, that the data has not changed since it was last retrieved. With pessimistic concurrency, data is locked and cannot be updated by another operation until the lock is released.

Consider the following guidelines when designing transactions:

  • Consider transaction boundaries, so that retries and composition are possible, and enable transactions only when you need them. Simple queries may not require an explicit transaction, but you should make sure that you are aware of your database's default transaction commit and isolation level behavior. By default Microsoft SQL Server® database executes each individual SQL statement as an individual transaction (auto-commit transaction mode).
  • Keep transactions as short as possible to minimize the amount of time that locks are held. Try to avoid using locks for long-running transactions, or locking during access to shared data, which may block access to data by other code. Avoid the use of exclusive locks, which can cause contention and deadlocks
  • Use the appropriate isolation level, which defines how and when changes become available to other operations. The tradeoff is data consistency versus contention. A high isolation level will offer higher data consistency at the price of overall concurrency. A lower isolation level improves performance by lowering contention at the cost of consistency.
  • If you are using the System.Transactions namespace classes, consider using the implicit model provided by the TransactionScope object in the System.Transactions namespace. Although implicit transactions are not as fast as manual, or explicit, transactions, they are easier to develop and lead to middle tier solutions that are flexible and easy to maintain. When using manual transactions, consider implementing the transaction within a stored procedure.
  • Where you cannot apply a commit or rollback, or if you use a long-running transaction, implement compensating methods to revert the data store to its previous state in case an operation within the transaction fails.
  • If you must execute multiple queries against a database, consider the use of multiple active result sets (MARS), which provides support for multiple forward only, read only results sets and allows multiple queries to be executed using the same connection. MARS can be useful in transaction-heavy concurrent applications.

Validation

Designing an effective input and data validation strategy is critical for the security of your application. Determine the validation rules for data received from other layers and from third-party components, as well as from the database or data store. Understand your trust boundaries so that you can validate any data that crosses these boundaries. Consider the following guidelines when designing a validation strategy:

  • Validate all data received by the data layer from all callers. Ensure that you correctly handle NULL values, and filter out invalid characters.
  • Consider the purpose to which data will be put when designing validation. For example, user input used in the creation of dynamic SQL should be examined for characters or patterns that occur in SQL injection attacks.
  • Return informative error messages if validation fails.

For more information on validation techniques, see Chapter 17 "Crosscutting Concerns."

XML

Extensible Markup Language (XML) is useful for interoperability and for maintaining data structure outside of the database. For performance reasons, be careful when using XML for very large volumes of data. If you must handle large volumes of data as XML, use attribute-based schemas where data values are stored as attributes, instead of element-based schemas that store the data values as the values of elements, and are consequently larger. Consider the following guidelines when designing for the use of XML:

  • Consider using XML readers and writers to access XML formatted data, especially for extremely large sets of XML data. If you need to interact with a relational database, consider using objects that support this functionality, such as the ADO.NET DataSet. Use common settings for whitespace and comment handling on XML readers and writers.
  • Consider using an XML schema to define formats and to provide validation for data stored and transmitted as XML. Consider using custom validators for complex data parameters within your XML schema. However, bear in mind that validation will impose a performance penalty.
  • Store XML in typed columns in the database, if available, for maximum performance. Set up indexes (if your database supports them) if you will be regularly querying the XML data.

Technology Considerations

The following guidelines will help you to choose an appropriate implementation technology and techniques, depending on the type of application you are designing and the requirements of that application:

  • If you require basic support for queries and parameters, consider using ADO.NET objects directly.
  • If you require support for more complex data access scenarios, or want to simplify your data access code, consider using the Enterprise Library Data Access Application Block. For more details about Enterprise Library, see Appendix F "patterns & practices Enterprise Library".
  • If you are building a data driven Web application with pages based on the data model of the underlying database, consider using ASP.NET Dynamic Data.
  • If you want to manipulate XML-formatted data, consider using the classes in the System.Xml namespace and its subsidiary namespaces, or Linq to XML (XLinq).
  • If you are using ASP.NET to create user interfaces, consider using a DataReader to access data to maximize rendering performance. DataReaders are ideal for read-only, forward-only operations in which each row is processed quickly.
  • If you are accessing SQL Server, consider using classes in the ADO.NET SqlClient namespace to maximize performance.
  • If you are accessing SQL Server 2008, consider using a FILESTREAM for greater flexibility in the storage of and access to BLOB data.
  • If you are designing an object-oriented business layer based on the Domain Model pattern, consider using an Object/Relational Mapping (O/RM) framework, such as the ADO.NET Entity Framework or the open source NHibernate framework (see Additional Resources at the end of this chapter for more information).

For guidance on choosing a data access technology, see Chapter 15 "Designing Data Components." For information about the data access technologies available on the Microsoft platform, see the Appendix C "Data Access Technology Matrix."

Performance Considerations

Performance is a function of both your data layer design and your database design. Consider both together when tuning your system for maximum data throughput. Consider the following guidelines when designing for performance:

  • Use connection pooling and tune performance based on results obtained by running simulated load scenarios.
  • Consider tuning isolation levels for data queries. If you are building an application with high-throughput requirements, special data operations may be performed at lower isolation levels than the rest of the transaction. Combining isolation levels can have a negative impact on data consistency, so you must carefully analyze this option on a case by case basis.
  • Consider batching commands to reduce the number of round trips to the database server.
  • Consider using optimistic concurrency with nonvolatile data to mitigate the cost of locking data in the database. This avoids the overhead of locking database rows, including the connection that must be kept open during a lock.
  • If using a DataReader, use ordinal lookups for faster performance.

Security Considerations

The data layer should protect the database against attacks that try to steal or corrupt data. It should allow only as much access to the various parts of the data source as is required. The data layer should also protect the mechanisms used to gain access to the data source. Consider the following guidelines when designing for security:

  • When using SQL Server, consider using Windows authentication with an implementation of the trusted subsystem model. For information on the trusted subsystem model, see Chapter 19 "Physical Tiers and Deployment."
  • Encrypt connection strings in configuration files instead of using a System or User Data Source Name (DSN).
  • When storing passwords, use a salted hash instead of an encrypted version of the password.
  • Require that callers send identity information to the data layer for auditing purposes.
  • Use parameterized SQL queries and typed parameters to mitigate security issues and reduce the chance of SQL injection attacks succeeding. Do not use string concatenation to build dynamic queries from user input data.

Deployment Considerations

When deploying the data layer, the goal of the software architect is to consider the performance and security issues in the production environment. Consider the following guidelines when deploying your data layer:

  • Locate the data access layer on the same tier as the business layer to improve application performance unless scalability or security concerns prevent this.
  • If you must support a remote data access layer, consider using the TCP protocol to improve performance.
  • Consider locating the data access layer on a different server to the database. The physical characteristics of a database server are often optimized for that role, and will rarely match the optimum operating characteristics for the data layer. The combination of both on one physical tier is extremely likely to reduce application performance.

Design Steps for the Data Layer

A correct approach to designing the data layer will reduce development time and assist in maintenance of the data layer after the application is deployed. This section briefly outlines an effective design approach for the data layer. Perform the following key steps when designing your data layer:

  1. Create an overall design for your data access layer. Identify data source constraints by determining if you are working with a greenfield or brownfield environment, and determine the associated restrictions. In addition, if any new development is required, consider how it will coexist with the data source in its current state.
    • In a greenfield scenario, where there is no prior work related to the data source, you have full control over the schema used by your data source. Restrictions are based on the data source itself.
    • In a brownfield scenario, you do not have control over data source schemas, and the data source could be anything from a database to gateway components used to interact with existing components. You must understand the complexity and constraints of the existing business. For example, you must determine if there a predefined operational data store or other restriction that will prevent you from changing the existing schema. However, you can usually add new tables or views to an existing schema. Also, determine if you are interacting with the data layer using Web services or with a legacy application using gateway components. In these cases, you will be restricted to operations defined in the Web service contract or in the interface exposed by the gateway components.
  2. Choose the entity types you need. Data access components deal with entities. Entities are used to contain and manage the data used by your application, and you should consider including any data validation code you require within the entities. Choosing an appropriate data type and format for your business entities is also important as it determines how interoperability and serialization requirements are met. For guidance on choosing the type of entities to use, and the types commonly used in business and data components, see Chapter 13 "Designing Business Entities." Consider the following while choosing and implementing the appropriate data format:
    • If you must support disconnected scenarios in simple CRUD-based applications, then use DataSets or individual DataTables. The most common approach is to use the ADO.NET provider. This is ideal when you are working with an existing application that already uses the ADO.NET providers. If you are developing a new application, you can use LINQ to Datasets to populate DataSets using LINQ queries.
    • If your data access layer will be accessed by other applications and you require interoperability with other systems and platforms, use an XML format.
    • If application maintainability is important, use custom business entities. This requires additional code to map the entities to database operations; however, Object/Relational Mapping (O/RM) solutions can reduce the amount of custom code required. Choose the ADO.NET Entity Framework, or another O/RM framework such as the open source NHibernate framework, if you need more flexibility.
    • Implement entities by deriving them from a base class that provides basic functionality and encapsulates common tasks. However, be careful not to overload the base class with unrelated operations, which would reduce the cohesiveness of entities derived from the base class and may result in maintainability and performance issues.
    • Design entities to rely on data access logic components for database interaction. Centralize implementation of all data access policies and related business logic. For example, if your business entities access SQL Server databases directly, all applications deployed to clients that use the business entities will require SQL connectivity and logon permissions.
  3. Choose your data access technology. Identify the functionality required for your data access logic and choose a technology that meets these requirements. For information on the range of data access technologies available on the Microsoft platform, see Appendix C "Data Access Technology Matrix."
  4. Design your data access components. Enumerate the data sources that you will access and decide on the method of access for each data source. Determine whether helper components are required or desirable to simplify development and maintenance of data access components. Finally, identify relevant design patterns. For example, consider using the Table Data Gateway, Query Object, Repository, and other patterns. For more information, see Chapter 15 "Designing Data Components."
  5. Design your service agents. Use the appropriate tool to add a service reference. This will generate a proxy and the data classes that represent the data contract from the service. Then determine how the service will be used in your application. For most applications, you should access the functionality and data provided by the service agents through data access components, which will provide a consistent interface regardless of the data source. For smaller applications, the business layer—or even the presentation layer—may access the service agent directly.

Relevant Design Patterns

Key patterns are organized by categories, as detailed in the following table. Consider using these patterns when making design decisions for each category.

Category

Relevant patterns

General

Active Record. Include a data access object within a domain entity.

Data Mapper. Implement a mapping layer between objects and the database structure that is used to move data from one structure to another while keeping them independent.

Data Transfer Object. An object that stores the data transported between processes, reducing the number of method calls required.

Domain Model. A set of business objects that represents the entities in a domain and the relationships between them.

Query Object. An object that represents a database query.

Repository. An in-memory representation of a data source that works with domain entities.

Row Data Gateway. An object that acts as a gateway to a single record in a data source.

Table Data Gateway. An object that acts as a gateway to a table or view in a data source and centralizes all of the select, insert, update, and delete queries.

Table Module. A single component that handles the business logic for all rows in a database table or view.

Batching

Parallel Processing. Allow multiple batch jobs to run in parallel to minimize the total processing time.

Partitioning. Partition multiple large batch jobs to run concurrently.

Transactions

Capture Transaction Details. Create database objects, such as triggers and shadow tables, to record changes to all tables belonging to the transaction.

Coarse Grained Lock. Lock a set of related objects with a single lock.

Implicit Lock. Use framework code to acquire locks on behalf of code that accesses shared resources.

Optimistic Offline Lock. Ensure that changes made by one session do not conflict with changes made by another session.

Pessimistic Offline Lock. Prevent conflicts by forcing a transaction to obtain a lock on data before using it.

Transaction Script. Organize the business logic for each transaction in a single procedure, making calls directly to the database or through a thin database wrapper.

For more information on the Domain Model, Table Module, Coarse-Grained Lock, Implicit Lock, Transaction Script, Active Record, Data Mapper, Data Transfer Object, Optimistic Offline Locking, Pessimistic Offline Locking, Query Object, Repository, Row Data Gateway, and Table Data Gateway patterns, see Fowler, Martin. Patterns of Enterprise Application Architecture. Addison-Wesley, 2002. Or at http://martinfowler.com/eaaCatalog/.

For more information on the Capture Transaction Details pattern, see "Data Patterns" at https://msdn.microsoft.com/en-us/library/ms998446.aspx.

Additional Resources

For more general data access guidelines and information, see the following resources: