This paper is an overview of dataflow programming architectures.    It traces the formation of current dataflow thought from its origins to the present.

 

It is interesting to see the two perspectives of research in this area.  One perspective seeks to optimize a way to write a textual or visual language to help with parallelization, determinism and writing efficient code.  The other perspective seeks to write functional and efficient code.  The former group works from a perspective of theory, proof, and mathematical rigor, and the latter from a perspective of practicality.  The paper makes reference to this at several points, and explicitly recognizes it in the section on non-determinism:

 

The dichotomy in regard to nondeterminism

appears to be the result of a division

between those who wish to use

dataflow as a means to ease the formal

proof and analysis of programs and those

who wish to use dataflow for all varieties

of programming problems. The former regard

determinacy as essential, whereas

the latter regard the provision of nondeterminacy

as essential.

 

It is unfortunate that Max/MSP and Pd are conspicuously missing from the paper's discussion.

 

Useful statements:

0.  The "no change" of value requirement seems to be the hardest issue to get around in pure data flow languages.

 

The best list of features that constitute

a dataflow language was put forward

by Ackerman [1982] and reiterated

by Whiting and Pascoe [1994] and Wail

and Abramson [1995]. This list includes

the following:

(1) freedom from side effects,

(2) locality of effect,

(3) data dependencies equivalent to scheduling,

(4) single assignment of variables,

(5) an unusual notation for iterations due

to features 1 and 4,

(6) lack of history sensitivity in procedures.

 

Because scheduling is determined from

data dependencies, it is important that

the value of variables do not change between

their definition and their use.

 

1. It was realized that the parallelism used

in dataflow architectures operated at too

fine a grain and that better performance

could be obtained through hybrid von

Neumann dataflow architectures. Many

of these architectures [Bic 1990] took

advantage of more coarse-grained parallelism

where a number of dataflow instructions

were grouped and executed in

sequence.

 

2. It is clear that dataflow provides the

potential to provide a substantial speed

improvement by utilizing data dependencies

to locate parallelism.

 

3.  about flow of data through a fork:

 

In Figure 1(b) the two arcs emanating

from input Y signify that that value is to

be duplicated. Forking arcs in this manner

is essential if a data token is needed

by two different nodes. A data token

that reaches a forked arc gets duplicated

and a copy sent down each branch. This

preserves the data independence and the

functionality of the system.

 

4. Alternate data flow: two types of data flow are described.  In one the data is passed along from node to node and duplicated at forks and merged or switched at joins.  In the other model, each node creates it's own representation of the data, keeping its own copy.

 

In the structure

model, however, each node creates only

one data object on each arc that remains

there: the node builds one or more data

structures on its output arcs. It is possible

for these structures to hold infinite

arrays of values, permitting open-ended

execution, and creating the same effect as

the token model, but with the advantage

that the structure model permits random

access of the data structures and history

sensitivity.

 

and:

 

However, the structure model has the

key disadvantage that it cannot store data

efficiently. It requires all data generated

to be stored, to preserve the ability to examine

the history of the program. Token

models are inherently more efficient at

storing data.

 

5.  Two approaches to execution activation:  Data availability approach  and the data demand approach.  In data availability , each node fires when data becomes available or changes.  In the demand approach, a node is fired when it receives a request for data from one of its output arcs.

 

The paper describes a number of visual data flow programming languages.  LabView and ProGraph

 

Four research areas of data flow:

 

—the provision of iteration in textual dataflow languages,

 

the problem is the nasty requirement that variables only be assigned once.  This makes an index variable difficult.  The solution is to use the following type of statement:  new x = x + 1, meaning that in the next instance, x will be x + 1.

 

—iteration structures in DFVPLs,

 

The solution to this problem that makes the most sense is the use of embedded iteration objects as is implemented in LabView.

           

—the use of data structures: The main problem here is the issue where dataflow requires that a data element only be assigned a value once.  The solutions have been 1. using pointers and reference counts to keep track of assignment changes, 2.  I-structures where variables are created with undefined values.  Only variables that are undefined may take on an assignment.  3.  a hybrid of the two where new copies of the variable are created when an assignment is made to a variable with more than one reference.

 

—nondeterminism.

 

Addition of random merge objects seems to be the solution of choice.