This paper is an overview of dataflow programming architectures. It traces the formation of current dataflow thought from its origins to the present.
It is interesting to see the two perspectives of research in this area. One perspective seeks to optimize a way to write a textual or visual language to help with parallelization, determinism and writing efficient code. The other perspective seeks to write functional and efficient code. The former group works from a perspective of theory, proof, and mathematical rigor, and the latter from a perspective of practicality. The paper makes reference to this at several points, and explicitly recognizes it in the section on non-determinism:
The dichotomy in regard to nondeterminism
appears to be the result of a division
between those who wish to use
dataflow as a means to ease the formal
proof and analysis of programs and those
who wish to use dataflow for all varieties
of programming problems. The former regard
determinacy as essential, whereas
the latter regard the provision of nondeterminacy
It is unfortunate that Max/MSP and Pd are conspicuously missing from the paper's discussion.
0. The "no change" of value requirement seems to be the hardest issue to get around in pure data flow languages.
The best list of features that constitute
a dataflow language was put forward
by Ackerman  and reiterated
by Whiting and Pascoe  and Wail
and Abramson . This list includes
(1) freedom from side effects,
(2) locality of effect,
(3) data dependencies equivalent to scheduling,
(4) single assignment of variables,
(5) an unusual notation for iterations due
to features 1 and 4,
(6) lack of history sensitivity in procedures.
Because scheduling is determined from
data dependencies, it is important that
the value of variables do not change between
their definition and their use.
1. It was realized that the parallelism used
in dataflow architectures operated at too
fine a grain and that better performance
could be obtained through hybrid von
Neumann dataflow architectures. Many
of these architectures [Bic 1990] took
advantage of more coarse-grained parallelism
where a number of dataflow instructions
were grouped and executed in
2. It is clear that dataflow provides the
potential to provide a substantial speed
improvement by utilizing data dependencies
to locate parallelism.
3. about flow of data through a fork:
from input Y signify that that value is to
be duplicated. Forking arcs in this manner
is essential if a data token is needed
by two different nodes. A data token
that reaches a forked arc gets duplicated
and a copy sent down each branch. This
preserves the data independence and the
functionality of the system.
4. Alternate data flow: two types of data flow are described. In one the data is passed along from node to node and duplicated at forks and merged or switched at joins. In the other model, each node creates it's own representation of the data, keeping its own copy.
In the structure
model, however, each node creates only
one data object on each arc that remains
there: the node builds one or more data
structures on its output arcs. It is possible
for these structures to hold infinite
arrays of values, permitting open-ended
execution, and creating the same effect as
the token model, but with the advantage
that the structure model permits random
access of the data structures and history
However, the structure model has the
key disadvantage that it cannot store data
efficiently. It requires all data generated
to be stored, to preserve the ability to examine
the history of the program. Token
models are inherently more efficient at
5. Two approaches to execution activation: Data availability approach and the data demand approach. In data availability , each node fires when data becomes available or changes. In the demand approach, a node is fired when it receives a request for data from one of its output arcs.
The paper describes a number of visual data flow programming languages. LabView and ProGraph
Four research areas of data flow:
—the provision of iteration in textual dataflow languages,
the problem is the nasty requirement that variables only be assigned once. This makes an index variable difficult. The solution is to use the following type of statement: new x = x + 1, meaning that in the next instance, x will be x + 1.
—iteration structures in DFVPLs,
The solution to this problem that makes the most sense is the use of embedded iteration objects as is implemented in LabView.
—the use of data structures: The main problem here is the issue where dataflow requires that a data element only be assigned a value once. The solutions have been 1. using pointers and reference counts to keep track of assignment changes, 2. I-structures where variables are created with undefined values. Only variables that are undefined may take on an assignment. 3. a hybrid of the two where new copies of the variable are created when an assignment is made to a variable with more than one reference.
Addition of random merge objects seems to be the solution of choice.