DiscoveryNet

From DiscoveryNet overview: (...)DiscoveryNet is building the software infrastructure and tools for providing Knowledge Discovery Services allowing scientists to conduct and manage complex data analysis and knowledge discovery activities over data generated by modern high throughput sensors. DiscoveryNet demonstrators include applications in life sciences, environmental modelling and geo-hazard prediction.
The project aims to design, develop and implement an advanced infrastructure to support real-time processing, interpretation, integration, visualization and mining of massive amounts of time critical data generated by high throughput devices. (...)

Project home page: http://www.discovery-on-the.net/ and some papers http://www.discovery-on-the.net/new/documentation.php (i could not find actual downloadable code)

(...)Discovery Net architecture provides open standards for specifying:

More info: (...) Workflow not only provides:

But also provides In one word, workflow represents the Knowledge of Action.
That is why we call the workflows in discovery informatics as Discovery Plans (...)

And more on DPML: (...)

(...)

DiscoveryNet and Workflows: they use Discovery Process Markup Language (DPML) "which allows the definition of data analysis tasks to be executed on distributed resources." but i could not find more details obut DPML even though they call it standard ... (http://www.bioinformaticsworld.info/biwspr03datamining.html and http://www.discovery-on-the.net/new/documents/dnet_architecture.pdf). Short example of DPML is in paper http://www.discovery-on-the.net/new/documents/kdd-DNET.pdf
DPML looks like data flow language and interesting quesiton is how it comapres to WSFL? Description in http://www.discovery-on-the.net/new/documents/DiscoveryProcesses.pdf does nto go into details:
(...) How scientists make use of computers to explore data is of central importance. Existing methods are largely ad-hoc using spreadsheets for data manipulation and separate algorithms packages for analysis. Where successful processes need to be automated, the traditional bioinformatics approach has been to create bespoke applications using scripting languages such as Perl. Recent approaches to representing discovery processes have been limited to using workflow languages such as WSFL[3] to define service composition for execution. These methodologies are labour intensive and problematic since the designer of the process is rarely the person who implements it as an application.
(...) The example below shows a simple DPML task where a microarray-generated gene expression data set has been manipulated to derive a new attribute, then passed to a K-means clustering node. In comparison to a traditional workflow language it is does not include any implementation specific details. How nodes are mapped to actual components is left as a matter for the execution environment, which also performs verification of the process. Each node?s operation is uniquely identified by an element that acts as a parameterisation message. The node?s inputs are determined by connection elements that define the graph?s structure. (...)

<?xml version="1.0" encoding="UTF-8"?>
<DPML xmlns="http://www.inforsense.com/DPML-1.0">
<nodegraph name="simple gene expression example">
<node id="903530559" name="Derive">
<Derive xmlns="http://www.inforsense.com/KDE-1.7">
<attribute name="avg" type="continuous"/>
<expression>avg(t1,t2,t3)</expression>
</Derive>
<history>
<change username="demo" date="2002-07-10 15:40:12"
comment="Node created"/>
<change username="demo" date="2002-07-10 15:40:47"
property="Derive expression" new="avg(t1,t2,t3)"/>
<change username="demo" date="2002-07-10 15:41:03"
property="Derived column name"
old="&lt;not specified&gt;" new="avg"/>
</history>
<notes>
<note>Find baseline expression for control</note>
<process name="CRISP-DM" step="3.3 Construct Data"/>
<location x="146" y="50"/>
</notes>
</node>
<node id="1585055100" name="gene expression">
<Table xmlns="http://www.inforsense.com/KDE-1.7"
query="Select * FROM &quot;synth_geneexpression&quot;"
imported_id="synth_geneexpression%973702954337_1"
sample="30" population="3000">
<attribute name="Gene" type="categorical"/>
<attribute name="t1" type="continuous"/>
<attribute name="t2" type="continuous"/>
<attribute name="t3" type="continuous"/>
</Table>
<history>
<change username="demo" date="2000-11-08 17:02:37"
comment="Node created"/>
</history>
<notes>
<process name="CRISP-DM" step="2.1 Collect Initial
Data"/><location x="65" y="50"/>
</notes>
</node>
<node id="497265749" name="KMeansCluster">
<KMeansCluster xmlns="http://www.inforsense.com/KDE-1.7"
k="3" gamma="1.0" iterations="50"
distance_method="Euclidean" >
<input><attribute name="Gene"/><attribute name="t1"/>
<attribute name="t2"/><attribute name="t3"/>
<attribute name="avg"/></input>
</KMeansCluster>
<history>
<change username="demo" date="2002-07-10 15:41:12"
comment="Node created"/>
</history>
<notes>
<note>Find three clusters of similar genes</note>
<process name="CRISP-DM" step="4.3 Build Model"/>
<location x="238" y="50"/>
</notes>
DPML looks like a mix of declartive language - they use SQL in XML? And thsi exampel does not look easy to use so as expected authors suggest that using tool will hide this complexity:
(...) Effective use of DPML relies on a graphical client such as Kensington [1] that captures all details that DPML can represent, allowing users to construct tasks with a drag and drop visual programming environment, and interpret results with a rich set of visualisation modules. (meta-mining) can assist users by finding common patterns of activity and identifying useful processes or relevant experts to deal with a given situation. DPML and the architecture described above have been implemented as part of the DiscoveryNet [2] (...)
However in this case argument to abandon WSFL is valid when workflow language is hidden by GUI anyway?


Subject: ProjectsList ScientificWorkflows Workflow