Boost C++ Libraries Home Libraries People FAQ More

PrevUpHomeNext

A generic approach to dataflow

[Note] Note

A lot of the ideas in this section have come from discussions with Tobias Schwinger and Douglas Gregor, whom I thank sincerely for their continuing help and suggestions.

The Signal Network library started as a way to facilitate dataflow programming by providing components and connections which allow large-scale use of Boost.Signals as a mechanism to model the transfer of data between processing components. However, during the planning, design and development of the library, it became apparent that

Hence, the Signal Network library is planned to be redesigned into a generic dataflow library (probably called Dataflow), and offer individual mechanism-specific modules (the Signal Network library being one of them).

The design of the Dataflow library might look something like this:

Table 1.1. Dataflow library tentative design

operators blueprint component
connect mechanism-specific
support

The layers are a bottom-up hierarchy, with dependencies only on layers underneath. The support layer provides the necessary generic traits and functions required for generic code to work with mechanism-specific components. Each type of mechanism or component must specialize the elements of the support layer to work with the mechanism/component. The connect layer provides the generic connect free function, which is then used by the operators and the envisioned blueprint layer. Each mechanism might have its own mechanism-specific layer, on top of which a mechanism-specific component can be implemented. As long as the support layer is implemented for the mechanism/component, the component should work seamlesly with the rest of the dataflow library.

Essentially, the support layer is a very minimal layer implementing a generic and extensible intrusive directed graph framework. The blueprint layer could be implemented as a BGL graph over a hierarchy of classes with a common interface (implemented using a base class with virtual methods) which provides serialization, instantiation, and (if applicable) execution capability for an entire network.

The different data transport mechanisms

To explore the concept of generic dataflow in C++ further, let us take a step back and examine where the flow of data happens in C++. Rather informally, we can divide things as follows:

Data can then be processed by the computational components by applying the components to the data in some appropriate way. There are several ways of providing data to the computational components:

Similarly, there are several ways of getting data out of a computation component

Another important thing to note is that there is a separation between providing the data to a component, and invoking the component. In some cases, the two happen simoultaneously (when the data is passed and returned via a function call), while in others invoking the computational component can happen separately from providing the data (at least from a modeling perspective). An example of the latter is a computational component which uses data stored in a particular location. We can place the data in that location much before invoking the computational component.

The Signal Network library relies on moving the data via function parameters and return values. Here, the Boost.Signals library is used to model these individual data channels which couple the data transfer and the computational component invocation. This is one possible dataflow-oriented approach.

There is another approach, proposed by Tobias Schwinger, in which the computational components are connected by "pins". Here, the data is communicated by placing it in locations where the components will read it, and reading it from locations where the components write them. Also, rather than the components activating each other, the network itself activates the components manually in an optimized order/way.

Each of these approaches has different properties. In the signal-based approach, the knowledge of the network is local - each component knows about where its signals are going, but it knows nothing of where the signals arriving at its own slots are coming from. Unless we record how the network was constructed, there is no "big picture" of what the complete network looks like. Similarly, the network is executed autonomously - the components invoke one another when appropriate, and no external control mechanism is required.

In the pin-based approach, the situation is reversed. There is a "big picture" of what the complete network looks like, and the network control mechanism uses this information to decide when a component should be invoked and to manage the data shared between the components via pins. Global knowledge of the network can be used for better optimization, serialization, etc. However, it may come at the price of some intrusiveness to the computation components.

What's happening here for Google Summer of Code?

Even though these the signal-based approach and pin-based approach are sufficiently different, they aim for the same goal: a dataflow-oriented approach using the C++ language. It is possible that an application designed in a dataflow-oriented way could be implemented using either of the approaches, that one approaches could be convertible to the other, or that using both approaches simoultaneously is in fact the best solution. Even though investigating the relationships between these two frameworks, or implementing both, would be outside the scope of this Google Summer of Code project, I believe it is worthwhile to keep the connections in mind while designing and implementing the Signal Network library.

To this end, I have started implementing a very simple framework that is based on phoenix2 actors, and uses straight pointers to indicate the flow of data. This is somewhat related to the pin-based approach (but much simpler and without the support for things like buffer/pin management which Tobias has envisioned), so it is helping me see what tends to be similar accross data transport mechanisms, and what tends to be different. This additional framework, as well as the generic Dataflow layers, will be committed to the repository shortly (mid-July), after I get a chance to integrate everything in a joint design.

Copyright © 2007 Stjepan Rajko

PrevUpHomeNext