In order to exchange messages, publishers and listeners need to be able to contact each other. If the network is down and they are unable to be reached, the network connection must be tried repeatedly till it is up and running. By using the agent middleware, this process is automated. If the network goes down while messages are being transmitted, messages should not be lost. To assure this, the publisher agent logs the message before sending it so that the message can be recovered in case the network fails while it is being sent. Similarly, the listener agent uses persistent leases and the tokens to make sure all messages are retrieved.
If a message sink is unable to process a message and throws an exception to message source, the publisher agent traps the exception and logs it. The publisher agent also sends the events in the same order that it has been received from the application, though it may send duplicate messages at times when the message may have been sent but if the acknowledgment is lost. The tokens in the message batch are used to remove these duplicates.
These features ensure that no messages are lost and that a ``best effort'' will be put to assure delivery while the network is up.
One of the features critical for a successful event system intended for the scientific applications is the ability to store application generated events permanently and allow users to execute queries on them. Such historical queries can be used to discover application behavior under different circumstances. The query object mentioned above can enable the user to perform such data mining by allowing retrieval of both historical and live events using a consistent message request interface. In addition, by using an RDBMS to store the events in the back end, we have provided the user with the option to use sophisticated data mining tools. We have provided initial support for SQL like queries as the persistent message storage is currently based on an RDBMS. Other querying mechanisms, in particular simple template queries similar to JINI, will be added in future.
Another factor that helps in performing historical queries that tend to be large is the ability to retrieve chunks of the message set and not the entire list of matching messages. This incremental retrieval allows the application to cancel the query midway without having to process all messages orginally asked for.
Another point in favor is the possibility of pulling messages, to simulate a push model, using the listener agents. This allows listeners behind firewalls to contact message sources and retrieve messages from them. Similarly, publishers behind firewalls can send messages to message sinks that are visible. Using this feature, we can create an entire messaging system by having just one host running the message channel (comprising of a message source and sink) to be visible and all other listeners and publishers can be protected by firewalls.
When building messaging environment that spans across multiple clusters, it helps to have many message channels that can service an optimal set of hosts and yet have the ability to interchange messages with listeners and publishers present beyond the local channel. This can be achieved by interconnecting message channels to form a messaging network or a cloud. In such a system, each channel would act as a listener to the other channels and subscribe to remote messages. We would need to have a proper message passing algorithm to ensure we do not have runaway messages that propagate indefinitely and avoid storing duplicate messages in the persistent store. Having a globally unique token for each message batch helps in addressing these concerns.