Home > Uncategorized > What is asymmetric clustering – Part 2

What is asymmetric clustering – Part 2

Symmetric and asymmetric clustering is one of the very important topics in SCEA .In last part we had covered , what is asymmetric clustering – part 1.

In this post , we will discuss come questions and concepts on asymmetric clustering.

What kind of applications typically implemented on symmetric clusters might benefit from being refactored into asymmetric ones?

Applications with smaller data sets that experience high request volumes and a relatively high write to read ratio. This kind of application typically doesn’t scale horizontally because of contention between cluster members even using approaches such as optimistic locking.

Applications that require sequenced request handling where a subset of the incoming events must be processed using some sequence or order. This can be implemented more efficiently using partitioning than with approaches using database locking.

Applications that have dynamic messaging requirements. If the set of message queues or topics used by an application changes dynamically during the time the application is deployed then a dynamic POJO message listener framework can be easily constructed with the exact threading model needed by the application rather than the normal cookie cutter approach.

Applications that have very high incoming message rates and where it makes sense to split the incoming message feed so that a specific subset only goes to a single cluster member. We’re partitioning the incoming topics into groups using hashing or some deterministic approach and each cluster member only receives messages for topics assigned to partitions hosted by that cluster member. This cluster member can then aggressively cache state for this subset and this improve performance as well as offloads the database. This again enables horizontal scale up especially when message order is important.

Can you give an example of a large scale application and how you might re-factor it into a partitioned, asymmetric model?

Let’s take an equity order matching system as an example. The system takes ordered streams of buy/sell commands as well as best price indications from other exchanges. It must match compatible buyers against sellers or route orders to an exchange when the pricing is favorable. The system must accept the buy/sell orders as well as best price messages, and process them in order for a given tradable company. The set of companies being traded is dynamic and can change while the system is running. The application can be factored in to two parts. A static part and a dynamic part.

The static part sets up a dynamic POJO container that provides dynamic topic subscription capabilities. A dynamic topic subscription service can be built as follows. Make a couple of daemon threads that use a JMS session per thread. When a new topic subscription is required then the application provides the topic and message listener callback to this service. The service adds the subscription to one of the JMS sessions. The WebSphere specific async beans APIs allow the application developer to build this kind of service pretty easily. The commonj APIs standardized between BEA and IBM can also but they lack some APIs needed to make this service really robust, they are a subset of what’s possible in async beans. It also provides a specialized event delivery subsystem that guarantees all events for a given stream are processed in order but events from different streams can be processed in parallel. Doug (Lea) told me that he’d seen this event delivery pattern in a TP monitor called Genie before. The number of threads in use by this event delivery subsystem is tunable and specifies an upper bound on the amount of concurrency for the application.

The dynamic part is partition driven. The application tells WebSphere when the application starts that the application wants to use say 512 partitions. WebSphere then starts each of the 512 partitions on exactly one cluster member. If there were 10 cluster members then 51 partitions would activate on each cluster member. The application receives an event from WebSphere when a partition is activated in each JVM. The application then asks the database for all the symbols and state that are grouped together in this partition. This grouping can be done using historical data. The application then subscribes to all message feeds needed for the symbols assigned to that partition. The database state for the symbols (lists of buy/sell orders, best prices for exchanges, etc) is cached using a write through cache implemented using CMP. When an order or exchange price is received then it’s sent to the sequencer component and then delivered to the application logic for handling orders or prices. The developer or administrator can specify policies to decide on which machine a partition can run. For example, partitions [0..3] can be grouped and run on machines A and B, partitions [4..7] can be grouped and run on machines B and then A. The policy mechanism is very flexible and can be modified in real time by an administrator.

We can demonstrate that application running on 6 dual processor blades at around 3k tps where each transaction consists of a database transaction with 12 statements and five outgoing messages. This represents the worst case workload of the application using real time matching when every incoming exchange price indication results in a match. More blades give linear scale up. The database needs to be partitioned also both from an availability point of view and for horizontal scalability. We used a quad CPU Intel box running DB2 that used a Network Appliance F940 iSCSI server for disks. The database was running at 98% CPU load at 3.6k transactions per second and these transactions were modifications, not simple cache hits.

Some of that sounds like a container implementation.

Well, people may stare at these static and dynamic portions and claim: “Hey, we could build a container for those and save the developers from the chore of writing this code”. Clearly, we could and we may but I’d prefer to provide the primitives used above as tools for application developers and let them build these components. I call these types of component or plumbing “micro-containers”. I think the ability for developers to write micro-containers is a key new feature for OLTP middleware in order to meet the performance goals of an application in an economical manner. WebSphere provides APIs like partitioning and threading (async beans) to allow this. Micro-containers that prove them-selves to be successful patterns can be componentized and reused in other applications, be made available as open source for other customers or could be incorporated in to off the shelf middleware once they are proven. Most of the current open source lightweight containers are basically just copying what’s in J2EE but using POJOs instead of EJBs. I think we’ll start to see more interesting, really innovative containers once customers start to exploit these kinds of features in WebSphere.

What kind of applications typically implemented on symmetric clusters might benefit from being refactored into asymmetric ones?

Applications with smaller data sets that experience high request volumes and a relatively high write to read ratio. This kind of application typically doesn’t scale horizontally because of contention between cluster members even using approaches such as optimistic locking.

Applications that require sequenced request handling where a subset of the incoming events must be processed using some sequence or order. This can be implemented more efficiently using partitioning than with approaches using database locking.

Applications that have dynamic messaging requirements. If the set of message queues or topics used by an application changes dynamically during the time the application is deployed then a dynamic POJO message listener framework can be easily constructed with the exact threading model needed by the application rather than the normal cookie cutter approach.

Applications that have very high incoming message rates and where it makes sense to split the incoming message feed so that a specific subset only goes to a single cluster member. We’re partitioning the incoming topics into groups using hashing or some deterministic approach and each cluster member only receives messages for topics assigned to partitions hosted by that cluster member. This cluster member can then aggressively cache state for this subset and this improve performance as well as offloads the database. This again enables horizontal scale up especially when message order is important.

Can you give an example of a large scale application and how you might re-factor it into a partitioned, asymmetric model?

Let’s take an equity order matching system as an example. The system takes ordered streams of buy/sell commands as well as best price indications from other exchanges. It must match compatible buyers against sellers or route orders to an exchange when the pricing is favorable. The system must accept the buy/sell orders as well as best price messages, and process them in order for a given tradable company. The set of companies being traded is dynamic and can change while the system is running. The application can be factored in to two parts. A static part and a dynamic part.

The static part sets up a dynamic POJO container that provides dynamic topic subscription capabilities. A dynamic topic subscription service can be built as follows. Make a couple of daemon threads that use a JMS session per thread. When a new topic subscription is required then the application provides the topic and message listener callback to this service. The service adds the subscription to one of the JMS sessions. The WebSphere specific async beans APIs allow the application developer to build this kind of service pretty easily. The commonj APIs standardized between BEA and IBM can also but they lack some APIs needed to make this service really robust, they are a subset of what’s possible in async beans. It also provides a specialized event delivery subsystem that guarantees all events for a given stream are processed in order but events from different streams can be processed in parallel. Doug (Lea) told me that he’d seen this event delivery pattern in a TP monitor called Genie before. The number of threads in use by this event delivery subsystem is tunable and specifies an upper bound on the amount of concurrency for the application.

The dynamic part is partition driven. The application tells WebSphere when the application starts that the application wants to use say 512 partitions. WebSphere then starts each of the 512 partitions on exactly one cluster member. If there were 10 cluster members then 51 partitions would activate on each cluster member. The application receives an event from WebSphere when a partition is activated in each JVM. The application then asks the database for all the symbols and state that are grouped together in this partition. This grouping can be done using historical data. The application then subscribes to all message feeds needed for the symbols assigned to that partition. The database state for the symbols (lists of buy/sell orders, best prices for exchanges, etc) is cached using a write through cache implemented using CMP. When an order or exchange price is received then it’s sent to the sequencer component and then delivered to the application logic for handling orders or prices. The developer or administrator can specify policies to decide on which machine a partition can run. For example, partitions [0..3] can be grouped and run on machines A and B, partitions [4..7] can be grouped and run on machines B and then A. The policy mechanism is very flexible and can be modified in real time by an administrator.

We can demonstrate that application running on 6 dual processor blades at around 3k tps where each transaction consists of a database transaction with 12 statements and five outgoing messages. This represents the worst case workload of the application using real time matching when every incoming exchange price indication results in a match. More blades give linear scale up. The database needs to be partitioned also both from an availability point of view and for horizontal scalability. We used a quad CPU Intel box running DB2 that used a Network Appliance F940 iSCSI server for disks. The database was running at 98% CPU load at 3.6k transactions per second and these transactions were modifications, not simple cache hits.

Some of that sounds like a container implementation.

Well, people may stare at these static and dynamic portions and claim: “Hey, we could build a container for those and save the developers from the chore of writing this code”. Clearly, we could and we may but I’d prefer to provide the primitives used above as tools for application developers and let them build these components. I call these types of component or plumbing “micro-containers”. I think the ability for developers to write micro-containers is a key new feature for OLTP middleware in order to meet the performance goals of an application in an economical manner. WebSphere provides APIs like partitioning and threading (async beans) to allow this. Micro-containers that prove them-selves to be successful patterns can be componentized and reused in other applications, be made available as open source for other customers or could be incorporated in to off the shelf middleware once they are proven. Most of the current open source lightweight containers are basically just copying what’s in J2EE but using POJOs instead of EJBs. I think we’ll start to see more interesting, really innovative containers once customers start to exploit these kinds of features in WebSphere.

Categories: Uncategorized Tags:
  1. No comments yet.
  1. No trackbacks yet.