Share via


An Azure Application Architecture for Security, Scalability, Performance, Redundancy and Reliability

This article is an attempt to document my experience of leveraging Azure to build a global system with security, scalability, performance, redundancy and reliability as some of the design goals. Note that this is not an attempt to capture every architectural aspect, but rather an attempt to use certain system features to highlight how Azure can be used to achieve some of the design goals.

Background

The system discussed is a rides haring application. Picture it as Uber for the common people. While Uber is a centrally managed taxi/limousine application charging fares on demand and catering to high-end customers, this is a peer-to-peer application empowering people with vehicles to share the available seats with people who want to go the same way.  Fares are voluntary. In short, it is a Social Transportation System that creates a global Social Transportation Ecosystem with Azure as the technology platform. 

Design Goals

 

While the system has a large number of design goals, here is a short list of some of the more important Azure-related ones:

  • Real-time – When a new trip request is created by a user either as a driver or a passenger, it triggers the Matching Service (engine) to find compatible matches, e.g., a driver needs to find potential passenger(s) and vice-versa. When found, the passenger’s profile and the matched trip is pushed to update the driver’s device and the same is done for the driver’s profile and the matched trip being pushed to the passenger(s). When the driver and the passenger(s) want to disclose their locations during meet-up, location information also needs be pushed through to each other in real-time fashion.

  • Device independent – A user may use a tablet at home to register a trip request, use a laptop in office to get match updates, then use a phone to locate and communicate to meet-up. The application front-end has to be device independent, data and notifications have to be able to reach any device the user happens to be using at the time.

  • Global – Users come from all over the world, data such as user profile, trip, and match and location information need to be kept in the chosen language and services and data need to be hosted regionally for performance reason.

  • Secure – Users use the application over the Internet and security is paramount. Data, access channel, service access points all need to be secure.

  • Fast – Performance is critical to user experience. The system need to be designed for speed.

  • Scalable – The system needs to be scalable to large number of global users. System usage and load demand can change over time and over the course of a day so it needs the ability to scale on-demand.

  • Redundant and Reliable – The system needs to be run 24/7 without interruption, it needs highly reliable operations with redundant backup resources.

  • Maintainable – The system needs to be highly maintainable for effective and efficient operations.

 

Application Architecture

The system is comprised of many sub-systems. For the purpose of this article, only a few selected sub-systems are presented. The following diagram shows a 3-tiered system architecture hosted in 3 datacenters (Western US, Western Europe and Eastern Asia):

Data Tier

Data for SQL, Message Queue, Data Cache, Storage, etc. are kept in the region where they belong. For example, all European users and their European trips and matches are stored in Western Europe datacenter. Web Services in that datacenter typically access data in the same datacenter. But in the case when a Japanese user who travels to Europe and makes a trip request for North America, the web service in Western Europe will access the East Asia datacenter for account information and the Western US datacenter for trip and match information. Reference data, active profile and location information are stored in Cache for speedy access. Backup data is stored in Blobs. Messages are stored in Message Queues for processing.

  • SQL Database – Database and tables are designed in such a way that they can be easily partitioned as volume grows. For example, account information table can easily be split by country. Trip information table is split by geographic co-ordinates to ensure small table size for speedy match calculations. For example, depending on usage, there can be one table to store trip data for the state of Montana, but one table just for the city of New York. Historic data is removed for archiving. Database can also be split when size/performance thresholds are reached.

  • Cache Role – When a user is actively using the system, its profile information will be accessed more frequently by other users. This information is placed in cache for speedy retrieval.

  • Shared Cache – Certain static information such as country names, labels, status and error messages of different languages are stored in Shared Cache within the Web Service. When a user logs in and if the information in the device is outdated a new version can be downloaded quickly.

  • Storage Blob – Periodic database backup is stored in Storage Blobs. Logs and diagnostic information, etc. are also stored in blobs.

  • Service Bus/Message Queue – When a user makes a trip request and the matching engine finds a match, Matching Service generates match messages, one for driver and one for passenger, and places them in a Match message queue. When a user changes his/her profile information, Account Service will also generate a message to be placed in the Profile message queue. When a user changes its location and if real-time location is enabled, the information is also placed in a Location message queue. The messages will be processed by a Messaging Worker Role for delivery to the Notification Hub, which in turns delivers to the messaging service and then to the appropriate user’s device.

Service Tier

A number of Web Roles provide services (Account, Trip, Matching, Location, etc.) to the application front-end. A number of Worker Roles perform tasks (Messaging, Monitoring, etc.). Caching Role provides temporary storage for speedy data access. These services are placed in each datacenter.

  • Traffic Manager – When the application front-end makes a service request to the Service Layer, Traffic Manager directs the request to the Web Service that responds the fastest, and this is typically the one in the same datacenter where the user’s application front-end resides. But if the Web Service is not available or is busy and has not responded soon enough, Traffic Manager will directs the request to a Web Service in other datacenter.

  • Web Role – Account, Trip and Location Services expose their service to the application front-end. When a user registers an account, makes a profile change or a trip request, or changes location, the respective service will connect to the appropriate data source and perform the appropriate actions. Web Roles are configured for auto-scale according to load condition.

  • Worker Role – Messaging Service processes messages in the message queues and places notifications into the notification hub (see below). Monitoring Service monitors operational health of the system and send alerts to administrator when needed. The service can automatically perform re-start, re-boot of faulty services where necessary. Worker Roles are configured for auto-scale to meet load condition.

  • Cache Role – Session ID, profile information, etc. are stored in Cache. Session ID is used by application front-end to identify the user to the services. Selected profile information for active users is stored in cache for download to other users’ device. In case when profile information is already downloaded into the device, last-modified-date is checked to prevent unnecessary download. Other reference data such as country name, label and error message of different languages are also cached and in case the version number of the data in the device is lower than the one in cache, new data is downloaded during application start-up.

  • Notification Hub – Notification Hub receives notifications generated by Messaging Service and publishes them to the appropriate notification services such as Apple, Android and Microsoft (see below).

Presentation Tier

The Application front-end is a HTML5 application hosted in Azure websites in each datacenter. Users can use any browser capable device to access the application.

  • Traffic Manager – When a user makes a request for the Application Front-end via a universal URL, Traffic Manager directs the request to the Website that responds the fastest, and this is typically the Website that is in the same regional as the user. But if the Website is not available or is busy at the time and has not responded soon enough, the Traffic Manager will direct the request to a Website in other datacenter.

  • Login/Logout – Phone number is the key identifier. User provides a phone number for first time login. The information is saved in cookie for future automatic login if configured.

  • Session ID – A unique session ID is created after successful login. It will be used to identify the session in every call to the service layer for authorization purpose. No other session state information is needed nor maintained at the service layer.

  • Reference Data – Certain information is persisted in the device and loaded into the application during startup. This includes information such as labels and messages for the selected language. When user changes language, new set of language dependent information is then downloaded. Version number check at startup triggers a reload if necessary.

  • Profile Data – User’s profile information is persisted in the device and loaded into the application during startup. Other users’ (driver and passenger information that had been fetched before) profile information are also persisted in the device for loading when necessary. Timestamp check triggers a reload if necessary.

  • Trip and Match Data – Trip and match data is loaded once at startup. As user makes trip changes or receives messages about matches and locations, they are updated locally.

 

Message Generator

A user, through the application and the Web Services, can generate messages. This includes:

  • Match change – When a user makes a trip request and the matching engine finds a match, Matching Service generates match messages (one for driver and one for passenger) and places them in a Match message queue. Ditto when a user finds a match or cancel a trip, then other potential matches need to be sent messages notifying them the match is no longer available.

  • Profile change – When a use updates his/her profile, its cache data, if available, needs to be updated. A profile changed message is created and placed in a Profile message queue.

  • Location change – When a user changes location and when the user wants to display its location information to other users, a location changed message is created and placed in a Location Message Queue.

Message Processor

Messaging Service processes messages in the different message queues to create notifications with the proper format and information. The messages are then placed in the Notification Hub.

Message Publisher

Notification Hub publishes messages to the appropriate Notification Service (Apple, Android, Windows) and ultimately to the recipients’ device and application front-end.

 

 

Conclusion

 

It has been quite a nice learning experience. For architects, developers and engineers who have been using Microsoft technologies, transition to Azure is not that hard as one builds upon what one already knows and one just picks up the missing pieces as needed. Azure’s flexible architecture helps architects design and build powerful, scalable and global architecture with relative ease. Developers continue to build using what they mostly already know, with familiar frameworks, languages and tools. Engineers expand their boundary into the cloud with easy-to-use portal and powerful tools. Besides the familiar features, Azure has many new ones that can be incorporated to build more powerful systems, and often there are different options that can achieve the same goal, so a key challenge is for an architect to understand the requirements, study the options, then select and test the appropriate design before committing to a build, and a more secure, scalable, fast, redundant and reliable system can be built with less time, effort, better quality, lower upfront costs and higher manageability. And all this is well within reach equally for small and big corporations.