Dataflow languages provide a programming paradigm based on interconnected components which process passing data. In general, this conceptually models data as something that originates from a source, flows through a number of processing components that manipulate it (e.g., by changing it, duplicating, etc.), and arrives at some final destination. As such, the dataflow paradigm is most suitable when developing applications that are themselves focused on the "flow" of data.

Perhaps the most readily available examples of a dataflow-oriented applications come from the realm of signal processing, e.g. a video signal processor which perhaps starts with a video input, modifies it through a number of processing components (video filters), and finally outputs it to a video display.

A motivating example

Let's take a simple real-time camera-input-displayed-on-the-screen application. Suppose there are three parts to the application - getting each image frame from the camera, processing the image in some way, and displaying it on the screen. To see how we might arrive at a dataflow-oriented implementation of this application, let's first begin with an imperative approach. Such an implementation might be structured as follows:

dataflow1

Basically, the main program loop is a series of instructions which does this particular job. If you wanted to be a little bit more object oriented, you could encapsulate all this functionality into an Image class and just call the methods that do the right thing from the main program loop, which would give you an object-oriented imperative solution.

However, let's take this a step further in the dataflow direction. Video input libraries often provide callback functionality which will deliver a video stream as a sequence of image frames given at the appropriate frame rate. With this in mind, we do the following:

implement a function which takes an image as input. the function first invokes the image processing library function that modifies the image as appropriate the function then invokes the GUI library function that displays the image register the function as a callback with the camera library the main program loop can relax and have some coffee.
  1. implement a function which takes an image as input.
    the function first invokes the image processing library function that modifies the image as appropriate the function then invokes the GUI library function that displays the image
    1. the function first invokes the image processing library function that modifies the image as appropriate
    2. the function then invokes the GUI library function that displays the image
  2. register the function as a callback with the camera library
  3. the main program loop can relax and have some coffee.

The situation now looks something like this:

dataflow2

So now, the image library is acting as a data/signal generator - it generates images at a certain frame rate. And the function we implemented seems to be a signal consumer which can take an image, process it, and display it on the screen.

This now employs the basic elements of the dataflow paradigm, but we could take it even further. Instead of just having two components, one image signal generator and one signal consumer, how about this:

implement a component which accepts an input image signal, modifies the image as appropriate, and then outputs a signal with the modified image implement a component which receives an imput image signal and displays it on the screen connect the camera library input stream to the first component connect the first component to the second component the main program loop can relax and have some tea, or even take a nap.
  1. implement a component which accepts an input image signal, modifies the image as appropriate, and then outputs a signal with the modified image
  2. implement a component which receives an imput image signal and displays it on the screen
  3. connect the camera library input stream to the first component
  4. connect the first component to the second component
  5. the main program loop can relax and have some tea, or even take a nap.

The big picture now looks like the following:

dataflow3

Advantages

There are already many programming paradigms supported by C++ (either directly or through additional libraries), so let's examine what the advantages of the the dataflow paradigm might be.

First of all, dataflow programming is not exclusive of other techniques, so adopting the dataflow paradigm does not hinder the use of other approaches. In fact, in C++ it can't - since the components themselves need to be implemented somehow, and we can't recursively define them forever as finer and finer dataflow diagrams, the dataflow paradigm relies on other programming techniques to do the underlying work. Also, dataflow can only be a part of a complete application implementation. You can always "extend your fingers" from other parts of the program in order to insert data into the dataflow, catch it on the other end, probe and adjust the components, etc. You can think of it as working with electronic components and changing the connections, turning knobs, flipping switches, or hanging over a circuit board and tinkering with it using, say, a multimeter and a 5V lead (just do it with care).

Second, dataflow promotes some good programming practices. When developing processing components, we have only the incoming data to deal with - with no requirements on where it is coming from. Hence, the developed components tend to be quite versatile and reusable. In the above example, the image processing component can be used with any image data generator - there is nothing inside the component that says "get the image from the camera", or "get the image from this type of data source (where the type is either a base class or a concept)". It just does it's thing, no matter where the data is coming from.

Third, when used in the right context, dataflow programming makes development and maintenance very intuitive. In the image processing example, say you don't want to process the image any more. You can just connect the camera signal directly to the screen display and cut out the image processing component. Someone gives you a new video signal generator component you'd like to use as input instead of the camera? Just plug it in. Literally.

Fourth, dataflow-oriented programs can be divided between threads, processors, or computers more easily, because the data dependencies are much more visible. In the image processing example, say you have the display on a different computer. You can just pass the connection to it through a network socket. With the data flow clearly specified, it is much easier to distribute the work either manually or even automatically (although the Signal Network library at the moment offers no such automatic functionality).

Finally, we are not to far from the advantages of a visual programming language, since the components and the connections have a natural graphical representation. With a visaul development environment, programming becomes as easy as connecting components with connections (again, the Signal Network library provides no visual programming functionality).

Go with the flow?

If you are interested in exploring the dataflow concept further, see

  • generic approach to dataflow in C++.
  • using Boost.Signals as a data transport mechanism.