The dataflow programming paradigm is based on interconnected components which process passing data. Basically, data is treated as something that originates from a source, flows through a number of processing components that manipulate it (e.g., by changing it, duplicating, etc.), and arrives at some final destination. As such, the dataflow paradigm is most suitable when developing applications that are themselves focused on the "flow" of data.
Perhaps the most readily available examples of a dataflow-oriented applications come from the realm of real-time signal processing, e.g. a video signal processor which perhaps starts with a video input, modifies it through a number of processing components (video filters), and finally outputs it to a video display. Another example is event processing.
A motivating example
Let's take a simple real-time camera-input-displayed-on-the-screen application. Suppose there are three parts to the application - getting each image frame from the camera, processing the image in some way, and displaying it on the screen. To see how we might arrive at a dataflow-oriented implementation of this application, let's first begin with an imperative approach. Such an implementation might be structured as follows:
Basically, the main program loop is a series of instructions which does this particular job. To take this a step further in the dataflow direction, we note that video input libraries often provide callback functionality which will deliver a video stream as a sequence of image frames given at the appropriate frame rate. With this in mind, we do the following:
-
implement a function which takes an image as input.
- the function first invokes the image processing library function that modifies the image as appropriate
- the function then invokes the GUI library function that displays the image
- register the function as a callback with the camera library
- the main program loop can relax and have some coffee.
The situation now looks something like this:
So now, the image library is acting as a data/signal producer - it generates images at a certain frame rate. And the function we implemented seems to be a signal consumer which can take an image, process it, and display it on the screen.
This now employs the basic elements of the dataflow paradigm, but we could take it even further. Instead of just having two components, one image signal generator and one signal consumer, how about this:
- implement a component which accepts an input image signal, modifies the image as appropriate, and then outputs a signal with the modified image
- implement a component which receives an imput image signal and displays it on the screen
- connect the camera library input stream to the first component
- connect the first component to the second component
- the main program loop can relax and have some tea, or even take a nap.
The big picture now looks like the following:
To give you a sense of how you would do something like this using the Dataflow library, we will present a slightly simplified example using Dataflow.Signals. Instead of processing images, we will just process numbers - but the dataflow parts of the code are pretty much the same.
We will first define a few components to use for our network:
#include <boost/dataflow/signals/component/filter.hpp> #include <boost/random/mersenne_twister.hpp> #include <boost/random/normal_distribution.hpp> #include <boost/random/variate_generator.hpp> #include <iostream> using namespace boost; // This will be our data processor. The signature void(double) designates // the output signal (we will be sending out a double). The signals // we can receive depend on how we overload operator(). class processor : public signals::filter<processor, void (double)> { public: // Initialize the Gaussian noise generator. processor() : generator(mt, dist) {} // Receive void(double) signals, add some Gaussian noise, and send // out the modified value. void operator()(double x) { out(x + generator()); } private: mt19937 mt; normal_distribution<> dist; boost::variate_generator<mt19937&, boost::normal_distribution<> > generator; }; // This will be our data output. We just make a function object, and specify // that it is a signal consumer by inheriting the signal::consumer class. class output : public signals::consumer<output> { public: void operator()(double x) { std::cout << x << std::endl; } };
And then connect them in a dataflow network:
#include <boost/dataflow/signals/component/storage.hpp> #include <boost/dataflow/signals/component/timed_generator.hpp> #include <boost/dataflow/signals/connection.hpp> #include "simple_example_components.hpp" using namespace boost; int main(int, char* []) { // For our data source, we will use timed_generator, // which creates its own thread and outputs it's stored value // at a specified time interval. We'll store a value of 0 to be sent out. // The signature void(double) specifies that the signal carries a double, // and that there is no return value. signals::timed_generator<void (double)> input(0); // Data processor and output: processor proc; output out; // ---Connect the dataflow network --------------------- // // ,---------. ,---------. ,---------. // | input | --> | proc | --> | out | // `---------' `---------' `---------' // // ----------------------------------------------------- input >>= proc >>= out; // If you prefer, you can also do: // connect(input, proc); // connect(proc, out); // Tell the source to start producing data, every 0.5s: input.enable(0.5); // take a little nap :-) boost::xtime xt; boost::xtime_get(&xt, boost::TIME_UTC); xt.sec += 10; boost::thread::sleep(xt); input.join(); return 0; }
A sample run produces:
0.213436 -0.49558 1.57538 -1.0592 1.83927 1.88577 0.604675 ...
...not quite image processing, but you get the (dataflow) point :-)
Advantages
There are already many programming paradigms supported by C++ (either directly or through additional libraries), so let's examine what the advantages of the the dataflow paradigm might be.
First of all, dataflow programming is not exclusive of other paradigms, so adopting the dataflow paradigm does not hinder the use of other techniques. In fact, in C++ it can't - since the components themselves need to be implemented somehow, and we can't recursively define them forever as finer and finer dataflow diagrams, the dataflow paradigm relies on other programming techniques to do the underlying work. Also, dataflow does not need be used for the entire application implementation. You can always use it for only those parts of the application it is appropriate for, and "extend your fingers" from other parts of the program in order to insert data into the dataflow, catch it on the other end, probe and adjust the components, etc. You can think of it as working with electronic components and changing the connections, turning knobs, flipping switches, or hanging over a circuit board and tinkering with it using, say, a multimeter and a 5V lead (just do it with care).
Second, dataflow promotes some good programming practices. When developing processing components, we have only the incoming data to deal with - with no requirements on where it is coming from. Hence, the developed components tend to be quite versatile and reusable. In the above example, the image processing component can be used with any image data generator - there is nothing inside the component that says "get the image from the camera", or "get the image from this type of data source (where the type is either a base class or a concept)". It just does it's thing, no matter where the data is coming from.
Third, when used in the right context, dataflow programming makes development and maintenance very intuitive. In the image processing example, say you don't want to process the image any more. You can just connect the camera signal directly to the screen display and cut out the image processing component. Someone gives you a new video signal generator component you'd like to use as input instead of the camera? Just plug it in. Literally.
Fourth, dataflow-oriented programs can be divided between threads, processors, or computers more easily, because the data dependencies are much more visible. In the image processing example, say you have the display on a different computer. You can just pass the connection to it through a network socket. With the data flow clearly specified, it is much easier to distribute the work either manually or even automatically (although the Dataflow library at the moment offers no such automatic functionality).
Finally, we are not to far from the advantages of a visual programming language, since the components and the connections have a natural graphical representation. With a visaul development environment, programming becomes as easy as connecting components with connections. In fact, the Dataflow library provides a small example illustrating this.
Go with the flow?
If you are interested in exploring dataflow programming further using the Dataflow library, see