next up previous
Next: Multi-protocol design Up: Performance results Previous: Test set C: Gantt

   
Observations

Figure 4 compares throughput for Sun RMI, Nexus RMI and SOAP RMI. The performance was also compared to a transfer of serialized array and linked list data over a raw socket connection. In general SOAP RMI is approximately ten times slower than Sun's implementation, not surprising considering the relative sizes of data that must be sent for the same object.

   
Figure 4: Roundtrip throughputs between E10Ks for linked list

Surprisingly, SOAP RMI outperforms Nexus RMI for small data sizes (Figures 11 and 12). For the linked list example, the cross-over point was around 10 nodes; for double arrays this was around 100 elements. Since SOAP RMI is consistently slower in the corresponding serialize/deserialize tests, this implies that the effect comes from buffering implementations and possibly from the cost of conversion to network representations. It also implies that using SOAP is not only acceptable for small messages such as control signals or small parameter exchanges between remote components, but that SOAP is actually preferrable in some cases.

Sun's native Java serialization and deserialization is closely tied to Java and not surprisingly, turns out to be the most efficient of the four. Nexus's serialization and deserialization protocols are also binary, like Sun's, but the serialization is designed to be interoperable with C/C++, which adds some overhead to its performance relative to Sun's native Java serialization. Apache SOAP uses DOM [10] for deserialization. It provides the ability to plug-in different encoding-styles: Soapv1.1 Encoding, literal XML and XMI (XML Metadata interchange [21]). The nanoSOAP implementation is a fast, simple implementation of serialization and deserialization of Java into SOAPv1.0 encodings and it uses SAX [16] for deserialization.

  
Figure 5: Serialization-deserialization for linked list on E10K (top: speed, bottom: throughput).
e10kLinkSerDesTimes
e10kLinkedListSerDesThroughput


   Figure 6: Serialization-deserialization sizes for linked list (top) and array (bottom) on E10K.
SizeLinkedList
SizeArray

Figure 6 shows that the size of a serialized data types in SOAP is approximately ten times larger than in Sun native serialization. This increase in size is from the translation of binary data into text. For example, in Java, each double takes 8 bytes. The string representation in XML of a double with 16 digits of precision takes at least 16 characters (and so 16 bytes in UTF-8) in addition to the 17 bytes for the tags <double> and </double>. Thus, each double serialized into XML could take at least 33 bytes. If the data is transferred using Unicode, that estimate doubles. In any case, the overhead is at least a factor of four larger in the XML representation of a double array. Since SOAP uses XML for data representation, this overhead is intrinsic to the SOAP protocol and cannot be removed by choosing a better implementation.

Serializing Java objects into SOAP-encoded XML data takes approximately ten times more memory than the binary representation. Figure 5 compares serialization and deserialization throughputs for E10K. Sun's native serialization-deserialization is the fastest. Nexus's performance is comparable to Sun's. Serialization and deserialization speeds for SOAP-based (Apache SOAP and nanoSOAP) implementations are approximately 100 times slower and their throughtputs are also a 100 times lower.

Some additional observations can be made from the experiments:

The Gantt diagrams from Test C in Figure 7 shows that costs of actual communication and data copying are considerably smaller than time spent on serialization and deserialization of XML encoded messages. Serialization and deserialization need to be the primary targets for improving performance of SOAP-based protocols. The graph data suggests at least a thirty percent improvement is possible if serialization and deserialization were pipelined.

  
Figure 7: Gantt diagram for roundtrip for linked list of size 1000 on sparc10

In more detail, the time taken for the roundtrip linked list example may be divided into the following segments:


next up previous
Next: Multi-protocol design Up: Performance results Previous: Test set C: Gantt
SoapTeam: Extreme Computing Lab
2000-08-17