Project home page: http://www.discovery-on-the.net/ and some papers http://www.discovery-on-the.net/new/documentation.php (i could not find actual downloadable code)
(...)Discovery Net architecture provides open standards for specifying:
More info: (...) Workflow not only provides:
And more on DPML: (...)
DiscoveryNet and Workflows: they use Discovery Process Markup Language (DPML)
"which allows the definition of data analysis tasks to be executed on distributed resources."
but i could not find more details obut DPML even though they call it standard ...
(http://www.bioinformaticsworld.info/biwspr03datamining.html
and http://www.discovery-on-the.net/new/documents/dnet_architecture.pdf).
Short example of DPML is in paper http://www.discovery-on-the.net/new/documents/kdd-DNET.pdf
DPML looks like data flow language and interesting quesiton is
how it comapres to WSFL? Description in
http://www.discovery-on-the.net/new/documents/DiscoveryProcesses.pdf
does nto go into details:
(...)
How scientists make use of computers to explore
data is of central importance. Existing methods are
largely ad-hoc using spreadsheets for data
manipulation and separate algorithms packages for
analysis. Where successful processes need to be
automated, the traditional bioinformatics approach
has been to create bespoke applications using
scripting languages such as Perl. Recent approaches
to representing discovery processes have been
limited to using workflow languages such as
WSFL[3] to define service composition for
execution. These methodologies are labour intensive
and problematic since the designer of the process is
rarely the person who implements it as an
application.
(...)
The example below shows a simple DPML task
where a microarray-generated gene expression data
set has been manipulated to derive a new attribute,
then passed to a K-means clustering node. In
comparison to a traditional workflow language it is
does not include any implementation specific details.
How nodes are mapped to actual components is left
as a matter for the execution environment, which
also performs verification of the process. Each
node?s operation is uniquely identified by an
element that acts as a parameterisation message. The
node?s inputs are determined by connection elements
that define the graph?s structure.
(...)
<?xml version="1.0" encoding="UTF-8"?> <DPML xmlns="http://www.inforsense.com/DPML-1.0"> <nodegraph name="simple gene expression example"> <node id="903530559" name="Derive"> <Derive xmlns="http://www.inforsense.com/KDE-1.7"> <attribute name="avg" type="continuous"/> <expression>avg(t1,t2,t3)</expression> </Derive> <history> <change username="demo" date="2002-07-10 15:40:12" comment="Node created"/> <change username="demo" date="2002-07-10 15:40:47" property="Derive expression" new="avg(t1,t2,t3)"/> <change username="demo" date="2002-07-10 15:41:03" property="Derived column name" old="<not specified>" new="avg"/> </history> <notes> <note>Find baseline expression for control</note> <process name="CRISP-DM" step="3.3 Construct Data"/> <location x="146" y="50"/> </notes> </node> <node id="1585055100" name="gene expression"> <Table xmlns="http://www.inforsense.com/KDE-1.7" query="Select * FROM "synth_geneexpression"" imported_id="synth_geneexpression%973702954337_1" sample="30" population="3000"> <attribute name="Gene" type="categorical"/> <attribute name="t1" type="continuous"/> <attribute name="t2" type="continuous"/> <attribute name="t3" type="continuous"/> </Table> <history> <change username="demo" date="2000-11-08 17:02:37" comment="Node created"/> </history> <notes> <process name="CRISP-DM" step="2.1 Collect Initial Data"/><location x="65" y="50"/> </notes> </node> <node id="497265749" name="KMeansCluster"> <KMeansCluster xmlns="http://www.inforsense.com/KDE-1.7" k="3" gamma="1.0" iterations="50" distance_method="Euclidean" > <input><attribute name="Gene"/><attribute name="t1"/> <attribute name="t2"/><attribute name="t3"/> <attribute name="avg"/></input> </KMeansCluster> <history> <change username="demo" date="2002-07-10 15:41:12" comment="Node created"/> </history> <notes> <note>Find three clusters of similar genes</note> <process name="CRISP-DM" step="4.3 Build Model"/> <location x="238" y="50"/> </notes>DPML looks like a mix of declartive language - they use SQL in XML? And thsi exampel does not look easy to use so as expected authors suggest that using tool will hide this complexity: