The replication document outlines the path a record takes during replication. In brief, a change takes place in Banner and an XML representation of a record is enqueued into an Oracle Advanced Queue (ED_VTPEOPLE_QUEUE). The Middleware Banner Replication SAR then asynchronously receives the XML record and sends it to the Banner Change bean for processing. Banner Change then uses the registry-manage beans to update the Registry if a change has taken place. If no change has taken place, replication ends. The writes to the Registry cause triggers to fire which enqueue an XML message into an Oracle AQ (ED_REGCHANGES_QUEUE), where the Registry Replication SAR receives the XML and calls the Registry Query beans to build a DSML representation of the record. This DSML is then sent to the Registry Change Bean where XSLT is performed to generate SPML for all the downstream consumers. The SPML is then enqueued into a JBossMQ Topic for such downstream consumers as the ED-LDAP Replication Service. Those services dequeue the SPML and update their corresponding servers as necessary.
This replication architecture, though extremely reliable, takes approximately one second to replicate from Banner to the Registry, and approximately one second to replicate from the Registry to downstream consumers, such as ED-LDAP. When a large number of records are enqueued into either of the Oracle Advanced Queues, replication can take a long time to complete. The problem is worse on the Registry to downstream consumer side, as some writes can cause multiple triggers to fire, resulting in records replicating unnecessarily. This slow replication causes users of downstream systems to view data that differs from the data that is in the Registry, which can be an annoyance to those clients since they must wait for the queues to empty before reliable data can be obtained.
We will present three solutions that should help decrease the total amount of time it takes to do downstream replication.
It should be noted that we hope an upgrade to EJB3 will give us some general speed improvements, though we cannot know that until we complete development. We also should check that we are using prepared statements for replication queries to avoid unnecessary Oracle overhead. Both of these will most likely speed up replication to some degree.
The Registry triggers fire for every add, delete, and update when a bean method writes to the database. As such, triggers can fire multiple times when a single transaction takes place. If we turn off the triggers and replicate a single record for such transactions, the total amount of replication from the Registry to downstream consumers is much less.
This solution speeds up replication by decreasing the amount of records that need to be built and sent downstream. This is accomplished by using EJB3 interceptors that will treat these writes as a single transaction.
With the current replication architecture, messages are received one at a time. If we move from a MessageListener to using TopicSubscriber.recieve(), we can then receive a preset number of messages (waiting for a specific amount of time in case there are no other messages), remove the duplicates within that set, use a pool of record builders to build the messages, and then acknowledge all of those messages at once.
The benefit of this solution is that unnecessary records aren't built, and those that are built can be built using some kind of threaded builder. Building records in this manner should reduce the total amount of time building records, and thus speed up replication.
It should be noted that this solution is optimized for large amounts of replication records. On average this approach will be somewhat slower since you must wait for the timeout to occur.
When a record is built for replication, a search must always be performed against the Registry. If we turn on JBoss caching, the amount of searches that go against the Registry will decrease, as those records may be in the JBoss cache.
This solution limits the overhead of always going to the database for replication data. It requires that Registry writes go through the beans, as any other writes will cause the cache to be stale.
Is the caching something you could turn on soon (in a test environment), without changing code? It seems to me that any agreed-upon improvement should be done sequentially, if possible. Thank you for the analysis! — Mary B Dunker 2007/08/30 11:20
It should be noted that at the last ED Technical meeting I asked for proposed solutions (to the performance issues) that could be discussed and vetted by the group. It has not been decided which of the suggestions made by Middleware make sense to be implemented, other than the fact that removing the database triggers cannot be implemented. This was discussed in the last meeting and Kevin and I indicated that totally removing the triggers could not be an option because it would break some of IRM's current processes. IRM's business processes depend upon that functionality existing. IRM did commit to doing an analysis of the triggers to see how we might cause them to fire more efficiently. This could be a contribution to the solution. My feeling is that the improvement in performance will come from implementing several things in conjunction rather than there being one thing we can do that will give us a huge performance gain. — Karen M Herrington 2007/08/30 13:05
Once IRM analizes the triggers to see if they can fire more efficiently, I think it would help to have a joint meeting of IRM and Middleware to review possible solutions. I agree that several things probably need to be done; I was just suggesting they be implemented one at a time so the perceived performance improvement can be measured. — Mary B Dunker 2007/08/30 15:22