Approaches to enforce real-time behavior in Ethernet
P. Pedreiras, L. Almeida , LSE-IEETA / DET, Universidade de Aveiro, Portugal

Abstract: Ethernet was originally designed to interconnect office equipment such as computers and printers. However, its wide availability, high bandwidth and low cost, made it appealing enough to be considered for use in other application domains, some of which presented real-time constraints, such as industrial and large embedded systems. Nevertheless, a direct use of Ethernet in those applications was seldom possible due to the unbounded network access delays that could arise from its medium access control protocol, i.e. CSMA/CD. Therefore, during its 30 years of existence, several proposals have been presented to achieve real-time behavior. This chapter presents an overview of the efforts towards real-time communication on Ethernet, including a brief description of several specific cases and highlighting some of the most recent advances in this area.

Keywords: Ethernet networks, fieldbuses, real-time communication, medium access control, real-time systems.

1  Introduction

Nowadays, intelligent nodes, i.e. microprocessor-based with communication capabilities, are extensively used in the lower layers of both process control and manufacturing industries [52]. In these environments, applications range from embedded command and control systems, to image processing, monitoring, human-machine interface, etc. Moreover, the communication between different nodes has specific requirements [5] that are quite different from, and sometimes opposed to, those found in office environments. For instance, predictability is favored against average throughput, and message transmission is typically time and precedence constrained. Furthermore, a lack of compliance with those constraints can have significant negative impact on the quality of the control action in distributed computer control systems (DCCS), or on the quality of the observation of the system state in distributed monitoring systems (DMS). Therefore, to deliver adequate quality of service, special-purpose networks have been developed, essentially during the last two decades, which are generically called fieldbuses and are particularly adapted to support frequent exchanges of small amounts of data under time, precedence and dependability constraints [52]. Probably, the most well known are PROFIBUS, WorldFIP, P‑NET, Foundation Fieldbus, TTP/C, CAN and CAN-based systems such as DeviceNet.

In the early days of DCCS, network nodes presented simple interfaces, and supported limited sets of actions. However, the quantity, complexity and functionality of nodes on a DCCS have been increasing steadily. As a consequence of this evolution, the amount of information that must be exchanged over the network has also increased, either for configuration as well as for operational purposes.

The increase in the amount of data exchanged among DCCS nodes is reaching limits that are achievable using traditional fieldbuses due to limited bandwidth, typically between 1 and 5 Mbps [5]. Machine vision is just one example of a killer application for those systems. Therefore, other alternatives are required to support higher bandwidth demands while retaining the main requirements of a real-time communication system: predictability, timeliness, bounded delays and jitter.

From the 1980’s, several general-purpose networks, exhibiting higher bandwidth than the traditional fieldbus protocols, have also been proposed for use at the field level. For example, two prominent networks FDDI and ATM have been extensively analyzed for both hard and soft real-time communication systems [50]. However, due to high complexity, high cost, lack of flexibility and interconnection capacity [50], these protocols have not gained general acceptance.

Another communication protocol that has been evaluated for use at the field level is Ethernet. Main factors that favor the use of this protocol are [5]: cheap silicon available, easy integration with Internet, clear path for future expandability, and compatibility with networks used at higher layers on the factory structure.

However, the non-deterministic arbitration mechanism used by Ethernet impedes its direct use at field level, at least for hard real-time communications. Therefore, in the past, several attempts have been made to allow Ethernet to support time-constrained communications. The methods that have been used to achieve deterministic message transmission over Ethernet range from modifications of the Medium Access Control (MAC) layer (eg. [28]), to the addition of sub-layers over the Ethernet layer to control the time instants of message transmission (eg. [54]) and therefore avoid collisions. More recently, with the advent of switched Ethernet, and therefore the intrinsic absence of collisions, a new set of works concerning the ability of this topology to carry time-constrained communications has appeared (e.g. [50]).

This chapter presents a brief description of the Ethernet protocol, followed by a discussion of several techniques that have been proposed or used to enforce real-time communication capabilities over Ethernet during the last two decades. The techniques referred to include those that support either probabilistic as well as deterministic analysis of the network access delay, covering thus diverse levels of real-time requirements from soft to hard real-time applications.

This chapter aims at bringing together in one volume different dispersed pieces of related work, trying to provide a global view of this niche area of real-time communication over Ethernet. The presentation is focused more on conceptual grounds than on mathematical formalism which can be found in the references provided in the text.

Moreover, for the sake of coherency with the original work referred to, several different expressions are used interchangeably in the text, but with similar meaning. This is the case of node, station and host referring to a computing element in a distributed system with independent network access, as well as message, frame and packet, referring to an Ethernet protocol data unit.

2  Ethernet roots

Ethernet was born about 30 years ago, invented by Bob Metcalfe at the Xerox’s Palo Alto Research Center. Its initial purpose was to connect two products developed by Xerox: a personal computer and a brand new laser printer. Since then, this protocol has evolved in many ways. For instance, concerning the transmission speed, it has grown from the original 2.94Mbps to 10Mbps [6][15][17] [16][18], then to 100Mbps [19] and more recently to 1Gbps [20] and 10Gbps [21]. Concerning physical medium and network topology, Ethernet started as a bus topology based initially on thick coaxial cable [15], and afterwards on thin coaxial cable [16]. In the mid 80’s, a more structured and fault-tolerant approach, based on a star topology, was standardized [17], running however only at 1Mbps. At the beginning of the 1990’s, an improvement was standardized [18], running at 10Mbps over category 5 unshielded twisted pair cable.

In this evolution process, two fundamental properties have been kept unchanged:

i)                    Single collision domain, i.e., frames are broadcast on the physical medium, and all the network interface cards (NIC) connected to it receive them;

ii)                   The arbitration mechanism, which is called Carrier Sense Multiple Access with Collision Detection (CSMA/CD).

According to the CSMA/CD mechanism (Figure 1), a NIC with a message to be transmitted must wait for the bus to become idle and, only then, it starts transmitting. However, several NICs may have sensed the bus during the current transmission and then tried to transmit simultaneously thereafter, causing a collision. In this case, all the stations abort the transmission of the current message, wait for a random time interval and try again. This retry mechanism is governed by the truncated Binary Exponential Back-off (BEB) algorithm, which duplicates the randomization interval every retry, reducing the probability of further collisions. The number of retries is limited to sixteen.

Figure 1: Ethernet CSMA/CD simplified state diagram

The use of a single broadcast domain and the CSMA/CD arbitration mechanism has created a bottleneck when facing highly loaded networks: above a certain threshold, as the submitted load increases,  the throughput of the bus decreases - a phenomenon referred to as thrashing. In the beginning of the 90’s, the use of switches in place of hubs has been proposed as an effective way to deal with thrashing. A switch creates a single collision domain for each of its ports. If a single node is connected to each port, collisions never actually occur unless they are created on purpose, e.g. for flow control. Switches also keep track of the addresses of the NICs connected to each port by inspecting the source address in the incoming messages. This allows forwarding incoming messages directly to the respective outgoing ports according to the respective destination address, a mechanism generally known as forwarding. When a match between a destination address and a port cannot be established, the switch forwards the respective message to all ports, a process commonly referred to as flooding. The former mechanism, forwarding, allows a higher degree of traffic isolation so that each NIC receives the traffic addressed to it, only. Moreover, since each forwarding action uses a single output port, several of these actions can be carried out in parallel, resulting in multiple simultaneous transmission paths across the switch and, consequently, in a significant increase in the global throughput.

3  Why using Ethernet at fieldbus level

Operation at fieldbus level implies the capacity to convey time-constrained traffic associated to sensors, controllers and actuators. However, as referred above, Ethernet was not designed to support that type of traffic and some of its properties, such as the non-deterministic arbitration mechanism, pose serious challenges for that purpose. Thus, why using it? Several works address the pros and cons of using Ethernet at the field level (e.g. [5][54][30]). Commonly referred arguments in favor are [5]:

  • It is cheap, due to mass production;

  • Integration with Internet is easy (TCP/IP stacks over Ethernet are widely available, allowing the use of application layer protocols such as FTP and HTTP;

  • Steady increases on the transmission speed have happened in the past, and are expected to occur in the  near future;

  • Due to its inherent compatibility with the communication protocols used at higher levels within industrial systems, the information exchange with plant level becomes easier;

  • The bandwidth made available by existing fieldbuses is insufficient to support some recent developments, such as the use of multimedia (e.g. machine vision) at the field level;

  •  Availability of technicians familiar with this protocol;

  • Wide availability of test equipment from different sources;

  • Mature technology, well specified and with equipment available from many sources, without  incompatibility issues.

On the other side, the most common argument against using Ethernet at the field level is its destructive and non-deterministic arbitration mechanism. A potential remedy is the use of switched Ethernet, which allows bypassing the native CSMA/CD arbitration mechanism. In this case, provided that a single NIC is connected to each port, and the operation is full duplex, no collisions occur.

However, just avoiding collisions does not make Ethernet deterministic: for example, if a burst of messages destined to a single port arrives at the switch in a given time interval, their transmission must be serialized. If the arrival rate is greater that the transmission rate, buffers will be exhausted and messages will be lost. Therefore, even with switched Ethernet, some kind of higher-level coordination is required. Moreover, bounded transmission delay is not the only requirement of a fieldbus. Some other important factors commonly referred to in the literature are: temporal consistency indication, precedence constraints, efficient handling of periodic and sporadic traffic as well as of short packets. Clearly, Ethernet, even with switches, does not provide answers to all those demands.

4  Making Ethernet real-time

The previous section discussed the pros and cons of using Ethernet for real-time communication, particularly for use as a fieldbus. Basically, Ethernet, by itself, cannot fulfill all the properties that are expected from a fieldbus. Therefore, specifically concerning real-time communication, several approaches have been developed and used. Many of them override Ethernet’s CSMA/CD medium access control by setting an upper transmission control layer that eliminates, or at least reduces, the occurrence of collisions at the medium access. Other approaches propose modification of the CSMA/CD medium access control layer so that collisions either occur seldom, or when they do, the collision resolution is deterministic and takes a bounded worst-case time. Moreover, some approaches support such deterministic reasoning on the network access delay while other ones allow for a probabilistic characterization, only.

In the remainder of this chapter, some paradigmatic efforts to improve Ethernet’s behavior with respect to meeting deadlines for the network access delay are briefly discussed. For the sake of clarity, they are classified as:

  • CSMA/CD based protocols;

  • modified CSMA protocols;

  • token passing;

  •  TDMA;

  • master/slave techniques;

  • and switched Ethernet.

5 CSMA/CD based protocols

This category of protocols paradoxically achieves real-time behavior by using standard Ethernet network adapters and relying on the original CSMA/CD contention resolution mechanism. They exploit the fact that the probability of collision upon network access is closely related to the traffic properties, namely the bus utilization factor, message lengths and precedence constraints [25][2]. In fact, knowing such traffic properties allows for computing probabilities of either packet losses or deadline misses [25][41].

In many distributed real-time applications the network utilization can be characterized at design-time, the traffic load is light and there is a predominance of short messages. In these cases, the expected deadline-miss ratio is low. [41] presents a table that contains, for a set of different load, bandwidth and deadline requirements, the interval of time for which there is a 99% probability of all messages being delivered within their respective deadlines. For instance, in a 100Mbps network with one thousand messages of 128 bytes payload generated every second, and a message deadline of 1ms, the referred interval is 1140 years! However, for the same load but with 2 ms deadlines, in a 10Mbps network that interval is reduced to one hour. This huge reduction in the interval for which all deadlines are met with 99% probability results from an increase in bus utilization from 1% to 10% approximately.

The numbers above show a strong dependence between the network utilization and the deadline miss probability. They also show that keeping the utilization sufficiently low, 1% in that case, and using relatively short messages, 128 bytes of payload in the example, the deadline miss probability can be almost negligible. Despite seeming a significant waste of resources, the fact that Ethernet bandwidth is significantly higher than the requirements of many practical applications makes this possibility a practical one. In fact, there are protocols for real-time communication over Ethernet that rely solely on such a low bandwidth utilization factor with small payloads, as shown in the following subsection on NDDS and ORTE.

Achieving higher bandwidth utilization factors together with a low deadline miss probability requires further control over the traffic in order to avoid bursts. This is called traffic smoothing or shaping and it is explained further on in this section. In both cases, the probability of deadline misses is non-zero but can be made arbitrarily small with a corresponding penalty in bandwidth efficiency. Thus, in principle, these techniques may also be used in hard real-time applications but their natural target is in soft real-time ones, such as multimedia, in which an occasional deadline miss just causes some transitory performance degradation. The application of these techniques to distributed computer control systems is also possible as long as such systems, from the control point of view, are designed to tolerate occasional sample losses.

5.1 NDDS

Network Data Delivery Service (NDDS) [38] is a middleware for distributed real-time applications made by Real-Time Innovation, Inc., based on the Real-Time Publisher-Subscriber model [43]. It is currently in a process of standardization within the Object Management Group (OMG), which has recently released the Data Distribution Service for Real-Time Systems Specification [4].

The NDDS architecture is depicted in Figure 2.

Figure 2: NDDS Architecture

The system architecture is centered on the NDDS database, which holds all the pertinent data concerning groups of publishers and subscribers. This database is accessible to the NDDS library and to a set of NDDS tasks. The former component provides a comprehensive set of services to the user applications, in the form of an Application Programming Interface. The NDDS tasks manage subscriptions and services, and send and receive publication updates. The NDDS database is shared among all network nodes, providing them with a holistic view of the communication requirements. Such global knowledge may be used to compute the probabilistic guarantee of deadline misses for the current load. This information is then made available to the application. There is no other mechanism to support traffic timeliness beyond admission control on the current communication load.

However, in terms of fault-tolerance, which is often a requirement of real-time systems, NDDS provides other mechanisms to support publisher redundancy. Thus, each group may have several subscribers and several replicated publishers of the same entity, e.g. a temperature value, all producing it in parallel. Each publication has two additional associated parameters: strength and persistency. The strength defines the relative weight of a publisher with respect to other publishers of the same entity. The persistency specifies the temporal validity of the publication. Subscribers consider a publication only if its strength is greater than or equal to the strength of the last one they received of that entity. In case the persistency window expires, the first publication of that entity after that instant is always accepted, irrespectively of its strength.

These mechanisms are complemented with sequence numbers assigned to each publication that allow detecting missing instances. Publishers keep their publications in a buffer for a specified period. During that period, subscribers that missed a publication may request its retransmission.

5.2 ORTE

The OCERA Real-Time Ethernet (ORTE) communication protocol [47] has been developed in the scope of an open source implementation of the Real-Time Publisher-Subscriber protocol [43]. This protocol allows establishing statistical real-time channels on Ethernet based on the limitation of the bandwidth utilization. The internal architecture of ORTE is presented in Figure 3, and it is functionally equivalent to the NDDS architecture (Section 5.1).

Figure 3: ORTE Internal Architecture

The ORTE layer is composed of Manager objects (M), which are responsible for the traffic management, and Managed Applications (MA), which are the objects that represent the user application within the ORTE layer.

Publisher redundancy and acknowledged message transmissions are supported by mechanisms equivalent to the ones presented in Section 5.1.

5.3 Traffic smoothing

Kweon et al introduced the traffic smoothing technique in [25]. In this work, the authors showed analytically that it is possible to provide a probabilistic guarantee that packets may be successfully transmitted within a pre-defined time bound, if the total arrival rate of new packets generated by all the stations on the network is kept below a threshold, called network-wide input limit. The probabilistic guarantee can be expressed by P(D£Dk*)>1­‑Z where Z is the loss tolerance and Dk* is the worst-case delay suffered by a packet when it is successfully transmitted at the kth trial.

Therefore, if the network average load is kept below a given threshold and bursts of traffic are avoided, a low probability of collisions can be obtained, so as an estimation of the network induced delay.

Figure 4: Software architecture of traffic smoothing

To enforce this behavior, an interface layer called traffic smoother (Figure 4) is placed just above the Ethernet data link layer. This element is in charge of shaping the traffic generated by the respective node according to a desired rate commonly referred to as station input limit. The traffic smoother implements a leaky bucket with appropriate depth and rate, that captures and smoothes the non-real-time traffic generated by the node. On the other hand, the real-time traffic is, by its nature, non-bursty thus spaced in time, with short payloads, resulting in low collision probability. Thus, it does not need smoothing and is immediately sent to the network, bypassing the leaky bucket.

The station input limit, i.e. the parameters of the leaky bucket, can either be defined statically at design time, or dynamically according to the current traffic conditions. The original implementation used the static approach, in which the station input limit is assigned at design time. A side effect of this technique is that it may lead to poor network bandwidth utilization whenever, at run-time, one or more stations use less bandwidth than they were assigned. In such circumstance, the unused bandwidth is simply wasted. Moreover, the number of stations must be known a priori to compute the station input limits. This is not adequate to an open system with stations that connect and disconnect at run-time.

Kweon et al [26] and Lo Bello et al [29] proposed the dynamic approach in which the bus load in assessed on-line and the station input limit may vary within a pre-define range. Thus, stations having higher communication requirements at some particular instant in time may reclaim bandwidth that is not being used by other stations at that time, thus increasing the potential bandwidth utilization. Moreover, as stations dynamically adapt themselves to the traffic conditions, this solution also scales well when the number of stations increases.

Lo Bello et al [31] developed a further evolution of the dynamic approach that consists in estimating the network load on-line using two parameters, the number of collisions and the throughput, both observed in a given interval. These parameters are then fed to a fuzzy controller to set the instantaneous station input limit. The resulting efficiency is substantially higher compared with both the static and the dynamic approaches relying on a single parameter to assess the bus state.

6 Modified CSMA protocols

As opposed to the previous category in which the native arbitration mechanism of Ethernet, i.e. CSMA/CD, is used as it is, in this category such arbitration protocol is adequately modified, namely the back off and retry mechanism, in order to improve the temporal behavior of the network (e.g. [28][46][3]). The result is still a fully distributed arbitration protocol of the CSMA family (Carrier Sense, Multiple Access) that determines when to transmit based on local information and on the current state of the bus, only.

There are two common options in this category, either delaying transmissions to reduce the probability of collisions or sorting out collisions in a controlled way. This section presents four protocols the first of which, Virtual Time CSMA, follows the first option by implementing a type of CSMA/CA (CSMA with Collision Avoidance) that delays message transmissions according to a temporal parameter. The remaining three protocols, Windows, CSMA/DCR and EQuB, follow the second option modifying the back-off and retry mechanism so that the network access delay for any contending message can be bounded. The latter two, i.e. CSMA/DCR and EquB, support a deterministic bound while the former, Windows, still uses a probabilistic approach to sort out particular situations.

6.1 Virtual Time CSMA

The Virtual Time CSMA protocol was presented in [33][37]. It allows implementing different scheduling policies by assigning different waiting times to messages submitted for transmission. The traffic on the bus is then serialized according to such waiting times, following an order that approximates the chosen scheduling policy. This mechanism is highly flexible in the sense that all common real-time scheduling policies can be implemented, either static priorities-based, e.g. rate-monotonic and deadline-monotonic, or dynamic priorities-based, e.g. minimum laxity first and earliest deadline first.

One of the most interesting features of this protocol is that its decisions are based on the assessment of the communication channel status, only. When the bus becomes idle and a node has a message to transmit, it waits for a given amount of time, related to the scheduling policy implemented. For example, if minimum laxity first (MLF) scheduling is used, the waiting time is derived directly from the laxity using a proportional constant. When this amount of time expires, and if the bus is still idle, the node tries to transmit the message. If a collision occurs, then the scheduler outcome resulted in more than one message having permission to be transmitted at the same time (e.g. when two messages in two nodes have the same laxity in MLF). In this case, the protocol can either recalculate the waiting time using the same rule or using a probabilistic approach. This last option, is important to sort out situations in which the scheduler cannot differentiate messages, e.g. messages with the same laxity would always collide.

Figure 5: Example of Virtual-Time CSMA operation using MLF

Figure 5 shows the operation of the Virtual-Time CSMA protocol with MLF scheduling. During the transmission of message m, messages a and b become ready. Since the laxity of message a (i.e. time to the deadline minus message transmission time) is shorter than the laxity of message b, message a is transmitted first. During the transmission of message a, message c arrives. Messages b and c have the same deadline and the same laxity. Therefore, an attempt will be made to transmit them at the same time, causing a collision. Then, the algorithm uses the probabilistic approach, with message b being given a random waiting time lower than that of message c, and thus being transmitted next. When the transmission of message b ends, the waiting time for message c is recomputed, and only after the expiration of this interval message c is finally transmitted.

Beyond the advantage of using standard Ethernet hardware, this approach also has the advantage of not requiring any other global information but the channel status, which is readily available from all NICs. Thus, the protocol can be implemented in a fully distributed and uniform way, with a relatively low computational overhead. Nevertheless, this approach presents some important drawbacks:

Performance highly dependent on the proportional constant value used to generate the waiting time, leading to:

o Excess of collisions if it is too short;

o Large amount of idle time if it is too long.

      Proportional constant value depends on the properties of the message set, therefore on-line changes to that set can lead to poor performance;

      Due to possible collisions, worst-case transmission time can be much higher than average transmission time and thus only probabilistic timeliness guarantees can be given.

      When implemented in software using off-the-shelf NICs, the computational overhead grows with the level of traffic on the bus because every transmission or collision raises an interrupt in all nodes to trigger the intervals of waiting time. This can be costly for networks with higher transmission rates, such as Fast or Gigabit Ethernet, mainly when the bus traffic includes many short messages.

6.2 Windows protocol

The Windows protocol was proposed both for CSMA/CD and token ring networks [33]. Concerning the CSMA/CD implementation, the operation is as follows. The nodes on a network agree on a common time interval referred to as window. All nodes synchronize upon a successful transmission, restarting the respective window. The bus state is used to assess the number of nodes with messages to be transmitted within the window:

  • if the bus remains idle, there are no messages to be transmitted in the window;

  • if only one message is in the window, it will be transmitted;

  • if two or more messages are within the window, a collision occurs.

Depending on the bus state, several actions can be performed:

  • if the bus remains idle, the window duration is increased in all nodes;

  • in the case of a collision, the time window is shortened in all nodes;

  • in case of a successful transmission, the window is restarted and its duration is kept as it is.

In the first two cases, the window duration is changed but the window is not restarted. Moreover, the window duration varies between a maximum (initial) and minimum values. Whenever there is a sufficiently long idle period in the bus, the window will return to its original maximum length. If a new node enters dynamically in the system, it may have an instantaneous window duration different from the remaining nodes. This may cause some perturbation during an initial period, with more collisions than expected. However, as soon as an idle period occurs, all windows will converge to the initial length. A probabilistic retry mechanism may also be necessary when the windows are shrunk to their minimum and collisions still occur (e.g. when two messages have the same transmission time).

Figure 6 : Resolving collisions with the Windows protocol

 Figure 6 shows a possible operational scenario using the windows protocol implementing MLF message scheduling. The top axis represents the latest send times (lst) of messages A, B and C. The lst of a message is the latest time instant by which the message transmission must start so that the respective deadline is met. This is equivalent to the laxity of the message as presented in the previous subsection. The first window (Step 1) includes the lst of three messages, thus leading to a collision. The intervenient nodes feel the collision, and the window is shrunk (Step 2). However, the lst of messages A and B are still inside the window, causing another collision. In response to this event the window size is shrunk again (Step 3). In this case only message A has its lst within the window, leading to a successful transmission.

This method exhibits properties that are very similar to those of the previous method (virtual time protocol). It is based on local information, only, it supports probabilistic bounds to the network delay and it can be easily implemented in a fully distributed and uniform way. The computational overhead is also similar to the previous case, growing for higher levels of bus traffic when implemented in software. However, this approach is more efficient than virtual time because of its adaptive behavior that can easily cope with dynamic number of nodes and dynamic communication requirements. Its efficiency, though, is substantially influenced by the magnitude of the steps in the variations of the window duration.

6.3 CSMA/DCR

In [28], LeLann and Rivierre presented the CSMA/DCR protocol, where DCR stands for Deterministic Collision Resolution. This protocol implements a fully deterministic network access scheme that consists on a binary tree search of colliding messages, i.e. there is a hierarchy of priorities in the retry mechanism that allows calculating the maximum network delay that a message can suffer.

During normal operation, the CSMA/DCR follows the standard IEEE 802.3 protocol (Random Access mode). However, whenever a collision is detected the protocol switches to the Epoch mode. In this mode, lower priority message sources voluntarily cease contending for the bus, and higher priority ones try again. This process in repeated until a successful transmission occurs (Figure 7). After all frames involved in the collision are transmitted, the protocol switches back to random access mode.

Figure 7: Example of tree search with CSMA/DCR

Figure 7 together with Table 1 depict the CSMA/DCR operation in a situation where 6 messages collide. Considering that lower indexes correspond to higher priorities, after the initial collision the right branch of the tree (messages 12, 14 and 15) cease contending for bus access. Since there are still three messages on the left branch, a new collision appears, between messages 2, 3 and 5. Thus, the left sub-branch is selected again, leaving message 5 out. In the following slot, messages 2 and 3 will collide again. The sub-branch selected after this collision has no active messages, and thus in the following time slot the bus will be idle (step 4). This causes a move to the right sub-branch, where messages 3 and 5 reside, resulting in a new collision. Finally, in step 6 the branch containing only the message with index 5 is selected, resulting in a successful transmission. The algorithm continues this way until all messages are successfully transmitted.

Table 1: Tree search example (channel status: C-collision, I-idle, X-transmission)

Despite assuring a bounded access time to the transmission medium, this approach exhibits two main drawbacks:

  • In some cases (e.g. [28]),  the firmware must be modified, therefore the economy of scale obtained when using standard Ethernet hardware is lost;

  • The worst-case transmission time, which is the main factor considered when designing real-time systems, can be substantially longer than the average transmission time. Consequently, all worst-case analysis will be very pessimistic leading to low bandwidth utilization, at least concerning real-time traffic.

6.4 EQuB

Sobrinho and Krishnakumar proposed the EquB protocol [49], which allows achieving predictable behavior on shared Ethernet networks. It consists on an overlay mechanism to the native CSMA/CD that allows real-time and non-real-time traffic to coexist on the same network while providing privileged access to the former over the latter, with a FCFS (First-Come-First-Served) access discipline between contending real-time sources.

The collision resolution mechanism for real-time sources requires disabling the native exponential back-off mechanism of Ethernet and the capacity to transmit jamming sequences with pre-defined durations. Both features must be configured in the network interface card of the respective hosts but the latter feature is not commonly supported by off-the-shelf NICs. The underlying real-time traffic model assumes that during long intervals of time, called sessions, real-time hosts generate continuously periodic streams of data to be transmitted over the network. This is the case, for example, when a host initiates the transmission of a video stream at constant bit rate.

Collisions involving non-real-time hosts, only, are sorted out by the native CSMA/CD mechanism of Ethernet. However, when real-time hosts participate in a collision, they transmit a jamming signal that is longer than that specified in the Ethernet MAC protocol, i.e. 32 bit times. These crafted jamming signals are called black bursts and their maximum duration is set proportional to the time a given host has been waiting to transmit a given message, i.e. the duration of the collision resolution process. During the transmission of a black burst, the bus state is continuously monitored. If, at some moment, a real-time host contending for the bus detects that no other nodes are sending black bursts, it infers that it is the host having the oldest ready message (highest priority according to FCFS), subsequently aborts the transmission of its own black-burst and transmits the data message immediately after. If a real-time host transmits its complete black burst and still feels the bus jammed it infers that other hosts having longer black-bursts, and consequently longer waiting times, are also disputing the bus. In this circumstance the host backs off, waiting for the bus to become idle for the duration of an inter-frame space (IFS). At this time, the black burst duration is recomputed to reflect the increased waiting time, and a new attempt is made to transmit the message.

Figure 8 illustrates the bus arbitration mechanism with two hosts having one real-time message each, 1 and 2, scheduled for transmission at instants t0 and t1, respectively, while a third data message is being transmitted. Since both hosts feel the bus busy, they wait for the end of current message transmission plus an IFS, which occurs at instant t3. According to EQuB, both nodes attempt to transmit their message at time t3 generating a collision and starting the transmission of black bursts (t4). Since message 2 has a shorter waiting time than message 1, its black burst is completely transmitted, terminating at instant t5, and the respective host backs-off, waiting for the bus to become idle again, before retrying the message transmission. At that point in time, the winning host, which has the oldest message, detects that there are no more jamming sequences from other hosts, stops sending its own jamming signal and initiates immediately the transmission of its data message, which happens at instant t6.

Figure 8: Black burst contention resolution mechanism

It is important to realize that non real-time data messages always loose the arbitration against any real-time message because real-time hosts transmit their messages right after the jamming signal without further delay, while the non-real-time messages follow the standard Ethernet back-off and retry mechanism (BEB). On the other hand, among real-time messages, the ones with longer waiting time are associated to longer black bursts. Thus, they are transmitted before other real-time messages with shorter waiting times, resulting in the FCFS serialization as discussed before.

Moreover, the EQuB protocol also takes advantage of the underlying periodic model of the real-time traffic and schedules the next transmission in each host one period later with respect to the transmission instant of the current instance. Thus, in some circumstances, particularly when the message periods in all real-time hosts are equal or harmonic, the future instances of the respective messages will not collide again, leading to a high efficiency in bus utilization and a to a round-robin service of real-time hosts.

7 Token passing

One well-know medium access control technique suited for shared broadcast bus networks is token passing. According to this technique, there is a single token in the entire network at any instant and only the node having possession of the token is allowed to trigger message transactions. The token is then circulated among all nodes according to an order that is protocol dependent.

In the simplest and more common way, the token rotates in a circular fashion, which tends to divide the bandwidth equally among all nodes in high traffic load conditions. For asymmetrical bandwidth distribution, some protocols allow the token to visit the same node more than one time in each token round. In both cases, a basic condition for real-time operation is that the time spent by the token at each node must be bounded. This can be achieved by using a timed-token protocol [32] as in the well-known cases of FDDI, IEEE 802.4 Token Bus and PROFIBUS. The same technique, i.e. a timed-token protocol, can be used to enforce real-time behavior on Ethernet networks, overriding the native CSMA/CD arbitration.

A commonly referred pitfall of the previous approaches is that token losses take time to be detected and recovered from, causing a transitory disruption in the network operation. Also, the bus access is not periodic due to the irregularity of token arrivals, causing a considerable jitter in high rate periodic message streams.

The above-referred pitfalls have been addressed by other particular protocols, e.g. RETHER and RT-EP, both explained below, that have substantially different token management policies. For example, in RETHER the token rotation is triggered periodically, irrespectively of the traffic transmitted in each cycle. On the other hand, in RT-EP the token is first circulated among all nodes to reach an agreement on the highest priority message ready to be transmitted and then it is directly handled to the respective transmitting node.

7.1 Timed-Token protocols

In timed-token protocols [32], the token visits all the nodes in a fixed order, without previous knowledge about their state concerning the number or priority of ready messages. Therefore, upon token arrival, a node may have several messages ready for transmission or none. In the former case, the node transmits its ready messages while in possession of the token. In the latter case, the token is forwarded immediately. The crux of the protocol consists in enforcing an upper limit to the interval of time that a node can hold the token before forwarding it, i.e. the token holding time. This interval of time is set dynamically upon each token arrival according to the difference between the target and the effective token rotation times. The target token rotation time is a configuration parameter with deep impact on the temporal behavior of the system. For example, it influences directly the worst-case interval between two consecutive token visits. The effective token rotation time is the interval that actually ellapsed bewteen a token arrival and the arrival of the previous one at the same node. Therefore, during each token visit, a node has more or less time to transmit messages depending on whether the token arrived early or late, respectively. In any case, a minimum transmission capacity is always granted to every node during each token visit to reduce network inaccessibility periods (synchronous bandwidth in FDDI and 802.4, or one high priority message in PROFIBUS).

Knowing the global communication requirements as well as the number of nodes in the network, it is possible to upper bound the time between two consecutive token visits to each node, thus providing an upper bound to the real-time traffic latency. The respective feasibility analysis is shown in [32] for IEEE 802.4 and in [53] for PROFIBUS.

Steffen et al [51] present an implementation of this concept on shared media local area networks. Although aiming particularly at shared Ethernet, the method may also be applied to networks like HomePNA [14] and Powerline [40].

The extended Ethernet protocol stack proposed in [51] is depicted in Figure 9.

Figure 9: Extended Ethernet protocol stack for timed-token operation

All the nodes connected to the network have a QoS sub layer (Token-Passing Protocol in Figure 9), which interfaces the Logical Link Control and the Medium Access Control layers. The QoS sub layer overrides the native arbitration mechanism, controlling the access to the bus via a token passing mechanism.

This protocol defines two distinct types of message streams, synchronous and asynchronous. Synchronous traffic is assumed to be periodic and its maximum latency can be bounded. It is characterized by the message transmission time, period and deadline. Asynchronous traffic is handled according to a best-effort policy, and thus no real-time guarantees are provided. Asynchronous streams are characterized by the message transmission time and desired average bandwidth.

Whenever the token arrives at a node, the synchronous messages are sent first. All nodes are granted at least a pre-defined synchronous bandwidth in all token visits to send such type of traffic. After the synchronous bandwidth is exhausted, a node can continue to transmit up to the exhaustion of its token holding time. After that, the token is forwarded to the next node in the circulation list.

In [51] Steffen et al present the adaptation of existing analytical tools to carry the feasibility analysis of the real-time communication requirements.

7.2 RETHER

The RETHER protocol was proposed by Venkatramani and Chiueh in [56]. This protocol operates in normal Ethernet CSMA/CD mode until the arrival of a real-time communication request, upon which it switches to token-bus mode.

In token-bus mode, real-time data is considered to be periodic and the time is divided in cycles of fixed duration. During the cycle duration the access to the bus is regulated by a token. First the token visits all nodes that are sources of real-time (RT) messages. After this phase, and if there is enough time until the end of the cycle, the token visits the sources of non-real-time (NRT) messages. An on-line admission control policy assures that all accepted RT requests can always be served and that new RT requests cannot jeopardize the guarantees of existing RT messages. Therefore, in each cycle all RT nodes can send their RT messages. However, concerning the NRT traffic, no timeliness guarantees are granted.

Figure 10 illustrates a possible network configuration with 6 nodes. Nodes 1 and 4 are sources of RT messages, forming the RT set. The remaining nodes have no such RT requirements and constitute the NRT set. The token first visits all the members of the RT set and after, if possible, the members of the NRT set. A possible token visit sequence could be: cycle i {1 – 4 – 1 – 2 – 3 – 4 – 5 – 6}, cycle i+1 {1 – 4 – 1 – 2}, cycle i+2 {1 – 4 – 1 – 2 – 3 – 4}… . In the ith cycle the load is low enough so that the token has time to visit the RT set plus all nodes in the NRT set, too. In the following cycle, besides the RT set, the token only visits nodes 1 and 2 of the NRT set and, in the next cycle, only nodes 1 through 4 of the NRT set are visited.

Figure 10 : Sample network configuration for RETHER

This approach supports deterministic analysis of the worst-case network access delay, particularly for the RT traffic. Furthermore, if the NRT traffic is known a priori, it is also possible to bound the respective network access delay which can be important, for example, for sporadic real-time messages. However, since the bandwidth available for NRT messages is distributed according to the nodes order established in the token circulation list, the first nodes always get precedence over the following ones leading to potentially very long worst-case network access delays. Moreover, this method involves a considerable communication overhead caused by the circulation of the token.

7.3 RT-EP: Real-Time Ethernet Protocol

The Real-Time Ethernet Protocol (RT-EP) [34][35] is a token-passing protocol which operates over Ethernet and which was designed to be easily analyzable using well-known schedulability analysis techniques.

An RT-EP network is logically organized as a ring, each node knowing which other nodes are its predecessor and successor. The token circulates from node to node within this logical ring.

Access to the bus takes place in two phases, arbitration and application message transmission. In the arbitration phase, the token visits all the nodes arranged in the logical ring to determine the one holding the highest priority message ready for transmission. For this purpose, the token conveys a priority threshold that is initialized with the lowest priority in the system every time an application message is transmitted. Then, upon token arrival, each node compares the priority of its own ready messages, if any, with the priority encoded in the token. If any of its ready messages has higher priority than the one encoded in the token, this one is updated. The token also carries the identity of the node that contains the highest priority message found so far.

After one token round the arbitration phase is concluded and the token is sent directly to the node having the highest priority ready message so that it can transmit it, i.e. application message transmission phase. After concluding the application message transmission, the same node starts a new arbitration phase.

Internally (Figure 11), each node has one priority transmission queue, within which all outgoing messages are stored in priority order, and a set of reception priority queues, one for each application requesting the reception of messages. The Main Communication Thread handles all the interaction with the network, carrying all the protocol related operations, namely the arbitration and application message transmission and reception.

Figure 11: RT-EP node architecture

User applications have access to three services:

  • Init_Comm: performs network initialization;

  •   Send_Info: places a message in the TX Queue for transmission;

  •  Recv_Info: reads a message (if present) from the application RX Queue.

RT-EP packets are carried in the Data field of the Ethernet frames. There are two distinct types of RT-EP packets, Token Packets and Info Packets.

The Token Packets are used during the arbitration phase, and contain: a packet identifier, specifying the functionality of the packet; priority and station address fields, identifying the highest priority ready message as well as the respective station ID; and a set of fields used to handle faults.

The Info Packets carry the actual application data and contain: a packet identifier field, specifying the packet’s type; a priority field, which contains the priority of the message being conveyed; a channel ID field, identifying the destination queue in the receiver node; a length field, defining the message data size; an info field, carrying the actual message data; and a packet number field, which is a sequence number used for fault-tolerance purposes.

The fault-tolerance mechanism [35] allows recovering from message losses, including token losses, within a bounded time. This mechanism is based on forcing all stations to permanently listen to the bus. Following any transaction, the predecessor station monitors the bus waiting for the following transmission of the next frame by the receiving station. If the receiving station does not transmit any frame within a given time window, the predecessor station assumes a message loss and retransmits it. After a predefined number of unsuccessful retries, the receiving station is considered as a failing station and it is excluded from the logical ring. This mechanism my lead to the occurrence of message duplicates. The sequence number field, present both in the Token and Info packets is used to discard the duplicate messages at the receiving nodes.

This protocol has been implemented on nodes running MaRTE OS, a POSIX compliant real-time kernel.

8 Time Division Multiple Access - TDMA

Another well-known technique to achieve predictable temporal behavior on shared communication networks is to assign exclusive time slots to distinct data sources, either nodes or devices, in a cyclic fashion. This is known as Time Division Multiple Access (TDMA) and it implies a global synchronization framework so that all nodes agree on their respective transmission slots. Hence, this is also a collision free medium access protocol that can be used on top of shared Ethernet to override its native CSMA/CD mechanism and prevent the negative impact of collisions. TDMA mechanisms are widely used, mainly in safety-critical applications. Examples of TDMA-based protocols include TTP/C, TT-CAN, SAFEBus and SWIFTNET. The remainder of this section addresses two particular TDMA implementations on shared Ethernet.

8.1 The MARS bus

The MARS bus was the networking infrastructure used in the MARS (MAintenable Real-time System) architecture [24] developed in the late 80s. Soon after, the MARS bus evolved into what is nowadays the TTP/C protocol. The MARS architecture aimed at fault-tolerant distributed systems providing active redundancy mechanisms to achieve high predictability and ease of maintenance.

In MARS, all activities are scheduled off-line, including tasks and messages. The resulting schedule is then used on-line to trigger the system transactions at the appropriate instants in time. Interactions among tasks, either local or remote, are carried out via MARS messages. It is the role of the MARS bus to convey MARS messages between distinct nodes (cluster components).

The MARS bus was based on a 10BASE2 Ethernet using standard Ethernet interface cards. A TDMA scheme was used to override Ethernet’s native CSMA/CD medium access control. The TDMA round consisted of a sequence of slots of equal duration, each assigned to one node in a circular fashion. Moreover, during each slot the tasks in each node were scheduled in a way to prevent contention between tasks on bus access [44].

8.2 Variable Bandwidth Allocation Scheme

The Variable Bandwidth Allocation Scheme was proposed for Ethernet networks by Lee and Shin in [27]. Basically, it is a TDMA transmission control mechanism in which the slots assigned to each node in the TDMA round can have different durations. This feature allows tailoring the bandwidth distribution among nodes according to their effective communication requirements and thus it is more bandwidth efficient than other TDMA-based mechanisms relying on equal duration slots, as it was the case in the MARS bus. Nowadays, this feature has been incorporated in most of the existing TDMA-based protocols, e.g. TTP/C and TT-CAN, improving their bandwidth efficiency.

Moreover, this technique also comprises the possibility of changing the system configuration on-line, namely adding or removing nodes, a feature that is sometimes referred to as Flexible TDMA (FTDMA) [57].

The nomenclature used in [27] uses the expression frame to refer to the TDMA round. Both the frame duration (frame time - F) together with the slot durations (slot times - Hi) are computed according to the specific traffic characteristics. The first slot in each frame (Tc) is reserved for control purposes such as time synchronization and addition/deletion of nodes. The structure of a TDMA frame is depicted in Figure 12.

Figure 12: The structure of a TDMA frame

The transmission of the control slot, Tc, as well as the inter-slot times, represent communication overhead. The inter-slot time must be sufficient to accommodate a residual global clock inaccuracy and to allow nodes to process incoming messages before the start of the following slot.

In their work, the authors derive a set of necessary conditions that a given allocation scheme f has to fulfill to compute both the frame (F) and slot durations (Hi) according to the communication requirements, i.e. message transmission times (Ci), periods (Pi) and system overhead (γ).

f: ( {Ci}, {Pi},  γ ) ( {Hi}, F )

Based on those conditions, the authors present an algorithmic approach for carrying the computation of F and Hi, and compare the results of this methodology with other TDMA approaches, namely MARS. The results obtained show the improvement in bandwidth utilization that may be achieved with this variable bandwidth allocation scheme.

9 Master/slave techniques

One of the simplest ways of enforcing real-time communication over a shared broadcast bus, including Ethernet, consists on using a master/slave approach, in which a special node, the master, controls the access to the medium of all other nodes, the slaves. The traffic timeliness is then reduced to a problem of scheduling that is local to the master. However, this approach typically leads to a considerable under-exploitation of the network bandwidth because every data message must be preceded by a control message issued by the master, resulting in a substantial communication overhead. Moreover, there is some extra overhead related to the turnaround time, i.e. the time that must elapse between consecutive messages, since every node must fully receive and decode the control message before transmitting the respective data message. Nevertheless, it is a rugged transmission control strategy that has been used in many protocols. This section will describe two examples, namely ETHERNET Powerlink [9] and FTT-Ethernet [39].

The case of FTT-Ethernet deserves a particular attention because it implements a variant of the master/slave technique that allows a substantial reduction in the protocol communication overhead. This is called the master/multi-slave approach [1] according to which the bus time is broken in cycles and the master issues one control message each cycle, only, indicating which data messages must be transmitted therein. This mechanism has been developed within the Flexible Time-Triggered communication framework (FTT) [8], and has been implemented over different network protocols, such as Controller Area Network [1] and Ethernet [39].

9.1 FTT-Ethernet protocol

The FTT-Ethernet protocol [39] combines the master/multi-slave transmission control technique with centralized scheduling, maintaining both the communication requirements and the message scheduling policy localized in one single node, the Master, and facilitating on-line changes to both, thus supporting a high level of operational flexibility.

Figure 13: FTT-Ethernet traffic structure

The bus time is divided in fixed duration time-slots called Elementary Cycles (ECs) that are further decomposed in two phases, the synchronous and asynchronous windows (Figure 13), which have different characteristics. The synchronous window carries the periodic time-triggered traffic that is scheduled by the master node. The expression time-triggered implies that this traffic is synchronized to a common time reference, which in this case is imposed by the master. The asynchronous window carries the sporadic traffic either related to protocol control messages, event-triggered messages, and non-real-time traffic in general . There is a strict temporal isolation between both phases so that the sporadic traffic does not interfere with the time-triggered one.

Despite allowing on-line changes to the attributes of the time-triggered traffic, the FTT-Ethernet protocol enforces global timeliness using on-line admission control. Due to the global knowledge and centralized control of the time-triggered traffic, the protocol supports arbitrary scheduling policies (e.g. RM and EDF), and may easily support dynamic QoS management complementary to admission control.

Beyond the flexibility and timeliness properties that this protocol exhibits, there are also some drawbacks that concern the computational overhead required in the master to execute both the message scheduling and the schedulability analysis on-line. This is, however, confined to one node. The computational power required by the slaves in what concerns the communication protocol is just to decode the trigger message in time and start the due transmissions in the right moments. Finally, in safety-critical applications the master must be replicated, for which there are specific mechanisms to ensure coherency between their internal databases holding the system communication requirements.

9.2 ETHERNET Powerlink

ETHERNET Powerlink [9] is a commercial protocol providing deterministic isochronous real-time communication, operating over hub-based Fast-Ethernet networks. A recently developed version, i.e. version 2.0, also allows operation over switched Ethernet networks but for applications with more relaxed temporal constraints, only. The protocol supports either periodic (isochronous) as well as event (asynchronous) data exchanges, and when implemented on hubs, it also provides a very tight time synchronization (accuracy better than 1μs) and fast update cycles (in the order of 500μs) for the periodic traffic. From architectural and functional points of view, this protocol bears many resemblances with the WorldFIP fieldbus.

The ETHERNET Powerlink protocol uses a master/slave transmission control technique, which completely prevents the occurrence of collisions at the bus access [10]. The network architecture is asymmetric, comprising a so-called Powerlink Manager (Master), and a set of Powerlink Controllers (Slaves). The former device controls all the communication activities, assigning time slots to all the remaining stations. The latter devices, Controllers, are passive bus stations, sending information only after explicit request from the Manager.

The Powerlink protocol operates isochronously, with the data exchanges occurring in a cyclic framework based on a micro cycle of fixed duration, i.e. the Powerlink cycle. Each cycle is divided in four distinct phases, called Start, Cyclic, Asynchronous and Idle Periods (Figure 14).

Figure 14: Powerlink cycle structure

A Powerlink cycle starts with a Start of Cycle message, sent by the Manager. This is a broadcast message, which instructs Controllers that a new cycle will start, and thus allows them to carry the preparation of the necessary data.

After the Start-Period follows the Cyclic-Period, in which the Controllers transmit the isochronous traffic. The transactions carried out in this period (window) are fully controlled by the Manager, which issues poll requests (PollRequest) to the Controllers. Upon reception of a PollRequest, controllers respond by transmitting the corresponding data message (PollResponse). The PollRequest message is a unicast message, directly addressed to the Controller node involved in the transaction. The corresponding PollResponse is a broadcast message, thus facilitating the distribution of data among all system nodes that may need it (producers-distributor-consumers communication model). Isochronous messages may be issued every cycle or every given number of cycles according to the application communication requirements. After completing all isochronous transactions of one cycle, the Manager transmits an End of Cycle message, signaling the end of the Cyclic-Period.

Asynchronous transactions may be carried out between the end of the Cyclic-Period and the end of the Powerlink Cycle. These messages may be asynchronous data messages (Invite/Send) or management messages, like Ident/AsyncSend, issued by the Manager to detect active stations. Since these transactions are still triggered by the Powerlink Manager, any node having asynchronous data to send must first notify the Manager of that fact. This is performed during an isochronous transaction involving that particular node, using piggybacked signaling in the respective PollResponse message. The Manager maintains a set of queues for the different asynchronous request sources, and schedules the respective transactions within the Asynch-Period, if there is time enough up to the end of the cycle. In case there is not time enough to complete a given asynchronous transaction or there is no scheduled asynchronous transaction, then the protocol inserts idle-time in the cycle (Idle-Period) in order to strictly respect the period of the