Scott Feldman | bbc53c7 | 2015-03-13 21:09:27 -0700 | [diff] [blame] | 1 | Rocker Network Switch Register Programming Guide |
| 2 | Copyright (c) Scott Feldman <sfeldma@gmail.com> |
| 3 | Copyright (c) Neil Horman <nhorman@tuxdriver.com> |
| 4 | Version 0.11, 12/29/2014 |
| 5 | |
| 6 | LICENSE |
| 7 | ======= |
| 8 | |
| 9 | This program is free software; you can redistribute it and/or modify |
| 10 | it under the terms of the GNU General Public License as published by |
| 11 | the Free Software Foundation; either version 2 of the License, or |
| 12 | (at your option) any later version. |
| 13 | |
| 14 | This program is distributed in the hope that it will be useful, |
| 15 | but WITHOUT ANY WARRANTY; without even the implied warranty of |
| 16 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
| 17 | GNU General Public License for more details. |
| 18 | |
| 19 | SECTION 1: Introduction |
| 20 | ======================= |
| 21 | |
| 22 | Overview |
| 23 | -------- |
| 24 | |
| 25 | This document describes the hardware/software interface for the Rocker switch |
| 26 | device. The intended audience is authors of OS drivers and device emulation |
| 27 | software. |
| 28 | |
| 29 | Notations and Conventions |
| 30 | ------------------------- |
| 31 | |
| 32 | o In register descriptions, [n:m] indicates a range from bit n to bit m, |
| 33 | inclusive. |
| 34 | o Use of leading 0x indicates a hexadecimal number. |
| 35 | o Use of leading 0b indicates a binary number. |
| 36 | o The use of RSVD or Reserved indicates that a bit or field is reserved for |
| 37 | future use. |
| 38 | o Field width is in bytes, unless otherwise noted. |
| 39 | o Register are (R) read-only, (R/W) read/write, (W) write-only, or (COR) clear |
| 40 | on read |
| 41 | o TLV values in network-byte-order are designated with (N). |
| 42 | |
| 43 | |
| 44 | SECTION 2: PCI Configuration Registers |
| 45 | ====================================== |
| 46 | |
| 47 | PCI Configuration Space |
| 48 | ----------------------- |
| 49 | |
| 50 | Each switch instance registers as a PCI device with PCI configuration space: |
| 51 | |
| 52 | offset width description value |
| 53 | --------------------------------------------- |
| 54 | 0x0 2 Vendor ID 0x1b36 |
| 55 | 0x2 2 Device ID 0x0006 |
| 56 | 0x4 4 Command/Status |
| 57 | 0x8 1 Revision ID 0x01 |
| 58 | 0x9 3 Class code 0x2800 |
| 59 | 0xC 1 Cache line size |
| 60 | 0xD 1 Latency timer |
| 61 | 0xE 1 Header type |
| 62 | 0xF 1 Built-in self test |
| 63 | 0x10 4 Base address low |
| 64 | 0x14 4 Base address high |
| 65 | 0x18-28 Reserved |
| 66 | 0x2C 2 Subsystem vendor ID * |
| 67 | 0x2E 2 Subsystem ID * |
| 68 | 0x30-38 Reserved |
| 69 | 0x3C 1 Interrupt line |
| 70 | 0x3D 1 Interrupt pin 0x00 |
| 71 | 0x3E 1 Min grant 0x00 |
| 72 | 0x3D 1 Max latency 0x00 |
| 73 | 0x40 1 TRDY timeout |
| 74 | 0x41 1 Retry count |
| 75 | 0x42 2 Reserved |
| 76 | |
| 77 | |
| 78 | * Assigned by sub-system implementation |
| 79 | |
| 80 | SECTION 3: Memory-Mapped Register Space |
| 81 | ======================================= |
| 82 | |
| 83 | There are two memory-mapped BARs. BAR0 maps device register space and is |
| 84 | 0x2000 in size. BAR1 maps MSI-X vector and PBA tables and is also 0x2000 in |
| 85 | size, allowing for 256 MSI-X vectors. |
| 86 | |
| 87 | All registers are 4 or 8 bytes long. It is assumed host software will access 4 |
| 88 | byte registers with one 4-byte access, and 8 byte registers with either two |
| 89 | 4-byte accesses or a single 8-byte access. In the case of two 4-byte accesses, |
| 90 | access must be lower and then upper 4-bytes, in that order. |
| 91 | |
| 92 | BAR0 device register space is organized as follows: |
| 93 | |
| 94 | offset description |
| 95 | ------------------------------------------------------ |
| 96 | 0x0000-0x000f Bogus registers to catch misbehaving |
| 97 | drivers. Writes do nothing. Reads |
| 98 | back as 0xDEADBABE. |
| 99 | 0x0010-0x00ff Test registers |
| 100 | 0x0300-0x03ff General purpose registers |
| 101 | 0x1000-0x1fff Descriptor control |
| 102 | |
| 103 | Holes in register space are reserved. Writes to reserved registers do nothing. |
| 104 | Reads to reserved registers read back as 0. |
| 105 | |
| 106 | No fancy stuff like write-combining is enabled on any of the registers. |
| 107 | |
| 108 | BAR1 MSI-X register space is organized as follows: |
| 109 | |
| 110 | offset description |
| 111 | ------------------------------------------------------ |
| 112 | 0x0000-0x0fff MSI-X vector table (256 vectors total) |
| 113 | 0x1000-0x1fff MSI-X PBA table |
| 114 | |
| 115 | |
| 116 | SECTION 4: Interrupts, DMA, and Endianness |
| 117 | ========================================== |
| 118 | |
| 119 | PCI Interrupts |
| 120 | -------------- |
| 121 | |
| 122 | The device supports only MSI-X interrupts. BAR1 memory-mapped region contains |
| 123 | the MSI-X vector and PBA tables, with support for up to 256 MSI-X vectors. |
| 124 | |
| 125 | The vector assignment is: |
| 126 | |
| 127 | vector description |
| 128 | ----------------------------------------------------- |
| 129 | 0 Command descriptor ring completion |
| 130 | 1 Event descriptor ring completion |
| 131 | 2 Test operation completion |
| 132 | 3 RSVD |
| 133 | 4-255 Tx and Rx descriptor ring completion |
| 134 | Tx vector is even |
| 135 | Rx vector is odd |
| 136 | |
| 137 | A MSI-X vector table entry is 16 bytes: |
| 138 | |
| 139 | field offset width description |
| 140 | ------------------------------------------------------------- |
| 141 | lower_addr 0x0 4 [31:2] message address[31:2] |
| 142 | [1:0] Rsvd (4 byte alignment |
| 143 | required) |
| 144 | upper_addr 0x4 4 [31:19] Rsvd |
| 145 | [14:0] message address[46:32] |
| 146 | data 0x8 4 message data[31:0] |
| 147 | control 0xc 4 [31:1] Rsvd |
| 148 | [0] mask (0 = enable, |
| 149 | 1 = masked) |
| 150 | |
| 151 | Software should install the Interrupt Service Routine (ISR) before any ports |
| 152 | are enabled or any commands are issued on the command ring. |
| 153 | |
| 154 | DMA Operations |
| 155 | -------------- |
| 156 | |
| 157 | DMA operations are used for packet DMA to/from the CPU, command and event |
| 158 | processing. Command processing includes statistical counters and table dumps, |
| 159 | table insertion/deletion, and more. Event processing provides an async |
| 160 | notification method for device-originating events. Each DMA operation has a |
| 161 | set of control registers to manage a descriptor ring. The descriptor rings are |
| 162 | allocated from contiguous host DMA-able memory and registers specify the rings |
| 163 | base address, size and current head and tail indices. Software always writes |
| 164 | the head, and hardware always writes the tail. |
| 165 | |
| 166 | The higher-order bit of DMA_DESC_COMP_ERR is used to mark hardware completion |
| 167 | of a descriptor. Software will clear this bit when posting a descriptor to the |
| 168 | ring, and hardware will set this bit when the descriptor is complete. |
| 169 | |
| 170 | Descriptor ring sizes must be a power of 2 and range from 2 to 64K entries. |
| 171 | Descriptor rings' base address must be 8-byte aligned. Descriptors must be |
| 172 | packed within ring. Each descriptor in each ring must also be aligned on an 8 |
| 173 | byte boundary. Each descriptor ring will have these registers: |
| 174 | |
| 175 | DMA_DESC_xxx_BASE_ADDR, offset 0x1000 + (x * 32), 64-bit, (R/W) |
| 176 | DMA_DESC_xxx_SIZE, offset 0x1008 + (x * 32), 32-bit, (R/W) |
| 177 | DMA_DESC_xxx_HEAD, offset 0x100c + (x * 32), 32-bit, (R/W) |
| 178 | DMA_DESC_xxx_TAIL, offset 0x1010 + (x * 32), 32-bit, (R) |
| 179 | DMA_DESC_xxx_CTRL, offset 0x1014 + (x * 32), 32-bit, (W) |
| 180 | DMA_DESC_xxx_CREDITS, offset 0x1018 + (x * 32), 32-bit, (R/W) |
| 181 | DMA_DESC_xxx_RSVD1, offset 0x101c + (x * 32), 32-bit, (R/W) |
| 182 | |
| 183 | Where x is descriptor ring index: |
| 184 | |
| 185 | index ring |
| 186 | -------------------- |
| 187 | 0 CMD |
| 188 | 1 EVENT |
| 189 | 2 TX (port 0) |
| 190 | 3 RX (port 0) |
| 191 | 4 TX (port 1) |
| 192 | 5 RX (port 1) |
| 193 | . |
| 194 | . |
| 195 | . |
| 196 | 124 TX (port 61) |
| 197 | 125 RX (port 61) |
| 198 | 126 Resv |
| 199 | 127 Resv |
| 200 | |
| 201 | Writing BASE_ADDR or SIZE will reset HEAD and TAIL to zero. HEAD cannot be |
| 202 | written past TAIL. To do so would wrap the ring. An empty ring is when HEAD |
| 203 | == TAIL. A full ring is when HEAD is one position behind TAIL. Both HEAD and |
| 204 | TAIL increment and modulo wrap at the ring size. |
| 205 | |
| 206 | CTRL register bits: |
| 207 | |
| 208 | bit name description |
| 209 | ------------------------------------------------------------------------ |
| 210 | [0] CTRL_RESET Reset the descriptor ring |
| 211 | [1:31] Reserved |
| 212 | |
| 213 | All descriptor types share some common fields: |
| 214 | |
| 215 | field width description |
| 216 | ------------------------------------------------------------------- |
| 217 | DMA_DESC_BUF_ADDR 8 Phys addr of desc payload, 8-byte |
| 218 | aligned |
| 219 | DMA_DESC_COOKIE 8 Desc cookie for completion matching, |
| 220 | upper-most bit is reserved |
| 221 | DMA_DESC_BUF_SIZE 2 Desc payload size in bytes |
| 222 | DMA_DESC_TLV_SIZE 2 Desc payload total size in bytes |
| 223 | used for TLVs. Must be <= |
| 224 | DMA_DESC_BUF_SIZE. |
| 225 | DMA_DESC_COMP_ERR 2 Completion status of associated |
| 226 | desc payload. High order bit is |
| 227 | clear on new descs, toggled by |
| 228 | hw for completed items. |
| 229 | |
| 230 | To support forward- and backward-compatibility, descriptor and completion |
| 231 | payloads are specified in TLV format. Fields are packed with Type=field name, |
| 232 | Length=field length, and Value=field value. Software will ignore unknown fields |
| 233 | filled in by the switch. Likewise, the switch will ignore unknown fields |
| 234 | filled in by software. |
| 235 | |
| 236 | Descriptor payload buffer is 8-byte aligned and TLVs are 8-byte aligned. The |
| 237 | value within a TLV is also 8-byte aligned. The (packed, 8 byte) TLV header is: |
| 238 | |
| 239 | field width description |
| 240 | ----------------------------- |
| 241 | type 4 TLV type |
| 242 | len 2 TLV value length |
| 243 | pad 2 Reserved |
| 244 | |
| 245 | The alignment requirements for descriptors and TLVs are to avoid unaligned |
| 246 | access exceptions in software. Note that the payload for each TLV is also |
| 247 | 8 byte aligned. |
| 248 | |
| 249 | Figure 1 shows an example descriptor buffer with two TLVs. |
| 250 | |
| 251 | <------- 8 bytes -------> |
| 252 | |
| 253 | 8-byte +––––+ +–––––––––––+–––––+–––––+ +–+ |
| 254 | align | type | len | pad | TLV#1 hdr | |
| 255 | +–––––––––––+–––––+–––––+ (len=22) | |
| 256 | | | | |
| 257 | | value | TVL#1 value | |
| 258 | | | (padded to 8-byte | |
| 259 | | +–––––+ alignment) | |
| 260 | | |/////| | |
| 261 | 8-byte +––––+ +–––––––––––+–––––––––––+ | |
| 262 | align | type | len | pad | TLV#2 hdr DESC_BUF_SIZE |
| 263 | +–––––+–––––+–––––+–––––+ (len=2) | |
| 264 | |value|/////////////////| TLV#2 value | |
| 265 | +–––––+/////////////////| | |
| 266 | |///////////////////////| | |
| 267 | |///////////////////////| | |
| 268 | |///////////////////////| | |
| 269 | |////////unused/////////| | |
| 270 | |////////space//////////| | |
| 271 | |///////////////////////| | |
| 272 | |///////////////////////| | |
| 273 | |///////////////////////| | |
| 274 | +–––––––––––––––––––––––+ +–+ |
| 275 | |
| 276 | fig. 1 |
| 277 | |
| 278 | TLVs can be nested within the NEST TLV type. |
| 279 | |
| 280 | Interrupt credits |
| 281 | ^^^^^^^^^^^^^^^^^ |
| 282 | |
| 283 | MSI-X vectors used for descriptor ring completions use a credit mechanism for |
| 284 | efficient device, PCIe bus, OS and driver operations. Each descriptor ring has |
| 285 | a credit count which represents the number of outstanding descriptors to be |
| 286 | processed by the driver. As the device marks descriptors complete, the credit |
| 287 | count is incremented. As the driver processes those outstanding descriptors, |
| 288 | it returns credits back to the device. This way, the device knows the driver's |
| 289 | progress and can make decisions about when to fire the next interrupt or not. |
| 290 | When the credit count is zero, and the first descriptors are posted for the |
| 291 | driver, a single interrupt is fired. Once the interrupt is fired, the |
| 292 | interrupt is disabled (auto-masked*). In response to the interrupt, the driver |
| 293 | will process descriptors and PIO write a returned credit value for that |
| 294 | descriptor ring. If the driver returns all credits (the driver caught up with |
| 295 | the device and there is no outstanding work), then the interrupt is unmasked, |
| 296 | but not fired. If only partial credits are returned, the interrupt remains |
| 297 | masked but the device generates an interrupt, signaling the driver that more |
| 298 | outstanding work is available. |
| 299 | |
Daniel P. Berrange | b6af097 | 2015-08-26 12:17:13 +0100 | [diff] [blame] | 300 | (* this masking is unrelated to the MSI-X interrupt mask register) |
Scott Feldman | bbc53c7 | 2015-03-13 21:09:27 -0700 | [diff] [blame] | 301 | |
| 302 | Endianness |
| 303 | ---------- |
| 304 | |
| 305 | Device registers are hard-coded to little-endian (LE). The driver should |
Stefan Weil | cb8d4c8 | 2016-03-23 15:59:57 +0100 | [diff] [blame] | 306 | convert to/from host endianness to LE for device register accesses. |
Scott Feldman | bbc53c7 | 2015-03-13 21:09:27 -0700 | [diff] [blame] | 307 | |
| 308 | Descriptors are LE. Descriptor buffer TLVs will have LE type and length |
| 309 | fields, but the value field can either be LE or network-byte-order, depending |
| 310 | on context. TLV values containing network packet data will be in network-byte |
| 311 | order. A TLV value containing a field or mask used to compare against network |
| 312 | packet data is network-byte order. For example, flow match fields (and masks) |
| 313 | are network-byte-order since they're matched directly, byte-by-byte, against |
| 314 | network packet data. All non-network-packet TLV multi-byte values will be LE. |
| 315 | |
| 316 | TLV values in network-byte-order are designated with (N). |
| 317 | |
| 318 | |
| 319 | SECTION 5: Test Registers |
| 320 | ========================= |
| 321 | |
| 322 | Rocker has several test registers to support troubleshooting register access, |
| 323 | interrupt generation, and DMA operations: |
| 324 | |
| 325 | TEST_REG, offset 0x0010, 32-bit (R/W) |
| 326 | TEST_REG64, offset 0x0018, 64-bit (R/W) |
| 327 | TEST_IRQ, offset 0x0020, 32-bit (R/W) |
| 328 | TEST_DMA_ADDR, offset 0x0028, 64-bit (R/W) |
| 329 | TEST_DMA_SIZE, offset 0x0030, 32-bit (R/W) |
| 330 | TEST_DMA_CTRL, offset 0x0034, 32-bit (R/W) |
| 331 | |
| 332 | Reads to TEST_REG and TEST_REG64 will read a value equal to twice the last |
| 333 | value written to the register. The 32-bit and 64-bit versions are for testing |
| 334 | 32-bit and 64-bit host accesses. |
| 335 | |
| 336 | A vector can be written to TEST_IRQ and the device will generate an interrupt |
| 337 | for that vector. |
| 338 | |
| 339 | To test basic DMA operations, allocate a DMA-able host buffer and put the |
| 340 | buffer address into TEST_DMA_ADDR and size into TEST_DMA_SIZE. Then, write to |
| 341 | TEST_DMA_CTRL to manipulate the buffer contents. TEST_DMA_CTRL operations are: |
| 342 | |
| 343 | operation value description |
| 344 | ----------------------------------------------------------- |
| 345 | TEST_DMA_CTRL_CLEAR 1 clear buffer |
| 346 | TEST_DMA_CTRL_FILL 2 fill buffer bytes with 0x96 |
| 347 | TEST_DMA_CTRL_INVERT 4 invert bytes in buffer |
| 348 | |
| 349 | Various buffer address and sizes should be tested to verify no address boundary |
| 350 | issue exists. In particular, buffers that start on odd-8-byte boundary and/or |
| 351 | span multiple PAGE sizes should be tested. |
| 352 | |
| 353 | |
| 354 | SECTION 6: Ports |
| 355 | ================ |
| 356 | |
| 357 | Physical and Logical Ports |
| 358 | ------------------------------------ |
| 359 | |
| 360 | The switch supports up to 62 physical (front-panel) ports. Register |
| 361 | PORT_PHYS_COUNT returns the actual number of physical ports available: |
| 362 | |
| 363 | PORT_PHYS_COUNT, offset 0x0304, 32-bit, (R) |
| 364 | |
| 365 | In addition to front-panel ports, the switch supports logical ports for |
| 366 | tunnels. |
| 367 | |
| 368 | Front-panel ports and logical tunnel ports are mapped into a single 32-bit port |
| 369 | space. A special CPU port is assigned port 0. The front-panel ports are |
| 370 | mapped to ports 1-62. A special loopback port is assigned port 63. Logical |
| 371 | tunnel ports are assigned ports 0x0001000-0x0001ffff. |
| 372 | To summarize the port assignments: |
| 373 | |
| 374 | port mapping |
| 375 | ------------------------------------------------------- |
| 376 | 0 CPU port (for packets to/from host CPU) |
| 377 | 1-62 front-panel physical ports |
| 378 | 63 loopback port |
| 379 | 64-0x0000ffff RSVD |
| 380 | 0x00010000-0x0001ffff logical tunnel ports |
| 381 | 0x00020000-0xffffffff RSVD |
| 382 | |
| 383 | Physical Port Mode |
| 384 | ------------------ |
| 385 | |
| 386 | Switch front-panel ports operate in a mode. Currently, the only mode is |
| 387 | OF-DPA. OF-DPA[1] mode is based on OpenFlow Data Plane Abstraction (OF-DPA) |
| 388 | Abstract Switch Specification, Version 1.0, from Broadcom Corporation. To |
| 389 | set/get the mode for front-panel ports, see port settings, below. |
| 390 | |
| 391 | Port Settings |
| 392 | ------------- |
| 393 | |
| 394 | Link status for all front-panel ports is available via PORT_PHYS_LINK_STATUS: |
| 395 | |
| 396 | PORT_PHYS_LINK_STATUS, offset 0x0310, 64-bit, (R) |
| 397 | |
| 398 | Value is port bitmap. Bits 0 and 63 always read 0. Bits 1-62 |
| 399 | read 1 for link UP and 0 for link DOWN for respective front-panel ports. |
| 400 | |
| 401 | Other properties for front-panel ports are available via DMA CMD descriptors: |
| 402 | |
| 403 | Get PORT_SETTINGS descriptor: |
| 404 | |
| 405 | field width description |
| 406 | ---------------------------------------------- |
| 407 | PORT_SETTINGS 2 CMD_GET |
| 408 | PPORT 4 Physical port # |
| 409 | |
| 410 | Get PORT_SETTINGS completion: |
| 411 | |
| 412 | field width description |
| 413 | ---------------------------------------------- |
| 414 | PPORT 4 Physical port # |
| 415 | SPEED 4 Current port interface speed, in Mbps |
| 416 | DUPLEX 1 1 = Full, 0 = Half |
| 417 | AUTONEG 1 1 = enabled, 0 = disabled |
| 418 | MACADDR 6 Port MAC address |
| 419 | MODE 1 0 = OF-DPA |
| 420 | LEARNING 1 MAC address learning on port |
| 421 | 1 = enabled |
| 422 | 0 = disabled |
David Ahern | 7734953 | 2015-06-10 18:21:18 -0700 | [diff] [blame] | 423 | PHYS_NAME <var> Physical port name (string) |
Scott Feldman | bbc53c7 | 2015-03-13 21:09:27 -0700 | [diff] [blame] | 424 | |
| 425 | Set PORT_SETTINGS descriptor: |
| 426 | |
| 427 | field width description |
| 428 | ---------------------------------------------- |
| 429 | PORT_SETTINGS 2 CMD_SET |
| 430 | PPORT 4 Physical port # |
| 431 | SPEED 4 Port interface speed, in Mbps |
| 432 | DUPLEX 1 1 = Full, 0 = Half |
| 433 | AUTONEG 1 1 = enabled, 0 = disabled |
| 434 | MACADDR 6 Port MAC address |
| 435 | MODE 1 0 = OF-DPA |
| 436 | |
| 437 | Port Enable |
| 438 | ----------- |
| 439 | |
| 440 | Front-panel ports are initially disabled, which means port ingress and egress |
| 441 | packets will be dropped. To enable or disable a port, use PORT_PHYS_ENABLE: |
| 442 | |
| 443 | PORT_PHYS_ENABLE: offset 0x0318, 64-bit, (R/W) |
| 444 | |
| 445 | Value is bitmap of first 64 ports. Bits 0 and 63 are ignored |
| 446 | and always read as 0. Write 1 to enable port; write 0 to disable it. |
| 447 | Default is 0. |
| 448 | |
| 449 | |
| 450 | SECTION 7: Switch Control |
| 451 | ========================= |
| 452 | |
| 453 | This section covers switch-wide register settings. |
| 454 | |
| 455 | Control |
| 456 | ------- |
| 457 | |
| 458 | This register is used for low level control of the switch. |
| 459 | |
| 460 | CONTROL: offset 0x0300, 32-bit, (W) |
| 461 | |
| 462 | bit name description |
| 463 | ------------------------------------------------------------------------ |
| 464 | [0] CONTROL_RESET If set, device will perform reset |
| 465 | [1:31] Reserved |
| 466 | |
| 467 | Switch ID |
| 468 | --------- |
| 469 | |
| 470 | The switch has a SWITCH_ID to be used by software to uniquely identify the |
| 471 | switch: |
| 472 | |
| 473 | SWITCH_ID: offset 0x0320, 64-bit, (R) |
| 474 | |
| 475 | Value is opaque to switch software and no special encoding is implied. |
| 476 | |
| 477 | |
| 478 | SECTION 8: Events |
| 479 | ================= |
| 480 | |
| 481 | Non-I/O asynchronous events from the device are notified to the host using the |
| 482 | event ring. The TLV structure for events is: |
| 483 | |
| 484 | field width description |
| 485 | --------------------------------------------------- |
| 486 | TYPE 4 Event type, one of: |
| 487 | 1: LINK_CHANGED |
| 488 | 2: MAC_VLAN_SEEN |
| 489 | INFO <nest> Event info (details below) |
| 490 | |
| 491 | Link Changed Event |
| 492 | ------------------ |
| 493 | |
| 494 | When link status changes on a physical port, this event is generated. |
| 495 | |
| 496 | field width description |
| 497 | --------------------------------------------------- |
| 498 | INFO <nest> |
| 499 | PPORT 4 Physical port |
| 500 | LINKUP 1 Link status: |
| 501 | 0: down |
| 502 | 1: up |
| 503 | |
| 504 | MAC VLAN Seen Event |
| 505 | ------------------- |
| 506 | |
| 507 | When a packet ingresses on a port and the source MAC/VLAN isn't known to the |
| 508 | device, the device will generate this event. In response to the event, the |
| 509 | driver should install to the device the MAC/VLAN on the port into the bridge |
| 510 | table. Once installed, the MAC/VLAN is known on the port and this event will |
| 511 | no longer be generated. |
| 512 | |
| 513 | field width description |
| 514 | --------------------------------------------------- |
| 515 | INFO <nest> |
| 516 | PPORT 4 Physical port |
| 517 | MAC 6 MAC address |
| 518 | VLAN 2 VLAN ID |
| 519 | |
| 520 | |
| 521 | SECTION 9: CPU Packet Processing |
| 522 | ================================ |
| 523 | |
| 524 | Ingress packets directed to the host CPU for further processing are delivered |
| 525 | in the DMA RX ring. Likewise, host CPU originating packets destined to egress |
| 526 | on switch ports are scheduled by software using the DMA TX ring. |
| 527 | |
| 528 | Tx Packet Processing |
| 529 | -------------------- |
| 530 | |
| 531 | Software schedules packets for egress on switch ports using the DMA TX ring. A |
| 532 | TX descriptor buffer describes the packet location and size in host DMA-able |
| 533 | memory, the destination port, and any hardware-offload functions (such as L3 |
| 534 | payload checksum offload). Software then bumps the descriptor head to signal |
| 535 | hardware of new Tx work. In response, hardware will DMA read Tx descriptors up |
| 536 | to head, DMA read descriptor buffer and packet data, perform offloading |
| 537 | functions, and finally frame packet on wire (network). Once packet processing |
| 538 | is complete, hardware will writeback status to descriptor(s) to signal to |
| 539 | software that Tx is complete and software resources (e.g. skb) backing packet |
| 540 | can be released. |
| 541 | |
| 542 | Figure 2 shows an example 3-fragment packet queued with one Tx descriptor. A |
| 543 | TLV is used for each packet fragment. |
| 544 | |
| 545 | pkt frag 1 |
| 546 | +–––––––+ +–+ |
| 547 | +–––+ | | |
| 548 | desc buf | | | | |
| 549 | +––––––––+ | | | | |
| 550 | Tx ring +–––+ +–––––+ | | | |
| 551 | +–––––––––+ | | TLVs | +–––––––+ | |
| 552 | | +–––+ +––––––––+ pkt frag 2 | |
| 553 | | desc 0 | | +–––––+ +–––––––+ | |
| 554 | +–––––––––+ | TLVs | +–––+ | | |
| 555 | head+–+ | +––––––––+ | | | |
| 556 | | desc 1 | | +–––––+ +–––––––+ |pkt |
| 557 | +–––––––––+ | TLVs | | | |
| 558 | | | +––––––––+ | pkt frag 3 | |
| 559 | | | | +–––––––+ | |
| 560 | +–––––––––+ +–––+ | | |
| 561 | | | | | | |
| 562 | | | | | | |
| 563 | +–––––––––+ | | | |
| 564 | | | | | | |
| 565 | | | | | | |
| 566 | +–––––––––+ | | | |
| 567 | | | +–––––––+ +–+ |
| 568 | | | |
| 569 | +–––––––––+ |
| 570 | |
| 571 | fig 2. |
| 572 | |
| 573 | The TLVs for Tx descriptor buffer are: |
| 574 | |
| 575 | field width description |
| 576 | --------------------------------------------------------------------- |
| 577 | PPORT 4 Destination physical port # |
| 578 | TX_OFFLOAD 1 Hardware offload modes: |
| 579 | 0: no offload |
| 580 | 1: insert IP csum (ipv4 only) |
| 581 | 2: insert TCP/UDP csum |
| 582 | 3: L3 csum calc and insert |
| 583 | into csum offset (TX_L3_CSUM_OFF) |
| 584 | 16-bit 1's complement csum value. |
| 585 | IPv4 pseudo-header and IP |
| 586 | already calculated by OS |
| 587 | and inserted. |
| 588 | 4: TSO (TCP Segmentation Offload) |
| 589 | TX_L3_CSUM_OFF 2 For L3 csum offload mode, the offset, |
| 590 | from the beginning of the packet, |
| 591 | of the csum field in the L3 header |
| 592 | TX_TSO_MSS 2 For TSO offload mode, the |
| 593 | Maximum Segment Size in bytes |
| 594 | TX_TSO_HDR_LEN 2 For TSO offload mode, the |
| 595 | length of ethernet, IP, and |
| 596 | TCP/UDP headers, including IP |
| 597 | and TCP options. |
| 598 | TX_FRAGS <array> Packet fragments |
| 599 | TX_FRAG <nest> Packet fragment |
| 600 | TX_FRAG_ADDR 8 DMA address of packet fragment |
| 601 | TX_FRAG_LEN 2 Packet fragment length |
| 602 | |
| 603 | Possible status return codes in descriptor on completion are: |
| 604 | |
| 605 | DESC_COMP_ERR reason |
| 606 | -------------------------------------------------------------------- |
| 607 | 0 OK |
| 608 | -ROCKER_ENXIO address or data read err on desc buf or packet |
| 609 | fragment |
| 610 | -ROCKER_EINVAL bad pport or TSO or csum offloading error |
| 611 | -ROCKER_ENOMEM no memory for internal staging tx fragment |
| 612 | |
| 613 | Rx Packet Processing |
| 614 | -------------------- |
| 615 | |
| 616 | For packets ingressing on switch ports that are not forwarded by the switch but |
| 617 | rather directed to the host CPU for further processing are delivered in the DMA |
| 618 | RX ring. Rx descriptor buffers are allocated by software and placed on the |
| 619 | ring. Hardware will fill Rx descriptor buffers with packet data, write the |
| 620 | completion, and signal to software that a new packet is ready. Since Rx packet |
| 621 | size is not known a-priori, the Rx descriptor buffer must be allocated for |
| 622 | worst-case packet size. A single Rx descriptor will contain the entire Rx |
| 623 | packet data in one RX_FRAG. Other Rx TLVs describe and hardware offloads |
| 624 | performed on the packet, such as checksum validation. |
| 625 | |
| 626 | The TLVs for Rx descriptor buffer are: |
| 627 | |
| 628 | field width description |
| 629 | --------------------------------------------------- |
| 630 | PPORT 4 Source physical port # |
| 631 | RX_FLAGS 2 Packet parsing flags: |
| 632 | (1 << 0): IPv4 packet |
| 633 | (1 << 1): IPv6 packet |
| 634 | (1 << 2): csum calculated |
| 635 | (1 << 3): IPv4 csum good |
| 636 | (1 << 4): IP fragment |
| 637 | (1 << 5): TCP packet |
| 638 | (1 << 6): UDP packet |
| 639 | (1 << 7): TCP/UDP csum good |
Scott Feldman | d0d2555 | 2015-07-01 03:33:11 -0700 | [diff] [blame] | 640 | (1 << 8): Offload forward |
Scott Feldman | bbc53c7 | 2015-03-13 21:09:27 -0700 | [diff] [blame] | 641 | RX_CSUM 2 IP calculated checksum: |
| 642 | IPv4: IP payload csum |
| 643 | IPv6: header and payload csum |
| 644 | (Only valid is RX_FLAGS:csum calc is set) |
| 645 | RX_FRAG_ADDR 8 DMA address of packet fragment |
| 646 | RX_FRAG_MAX_LEN 2 Packet maximum fragment length |
| 647 | RX_FRAG_LEN 2 Actual packet fragment length after receive |
| 648 | |
Scott Feldman | d0d2555 | 2015-07-01 03:33:11 -0700 | [diff] [blame] | 649 | Offload forward RX_FLAG indicates the device has already forwarded the packet |
| 650 | so the host CPU should not also forward the packet. |
| 651 | |
Scott Feldman | bbc53c7 | 2015-03-13 21:09:27 -0700 | [diff] [blame] | 652 | Possible status return codes in descriptor on completion are: |
| 653 | |
| 654 | DESC_COMP_ERR reason |
| 655 | -------------------------------------------------------------------- |
| 656 | 0 OK |
| 657 | -ROCKER_ENXIO address or data read err on desc buf |
| 658 | -ROCKER_ENOMEM no memory for internal staging desc buf |
| 659 | -ROCKER_EMSGSIZE Rx descriptor buffer wasn't big enough to contain |
| 660 | packet data TLV and other TLVs. |
| 661 | |
| 662 | |
| 663 | SECTION 10: OF-DPA Mode |
| 664 | ====================== |
| 665 | |
| 666 | OF-DPA mode allows the switch to offload flow packet processing functions to |
| 667 | hardware. An OpenFlow controller would communicate with an OpenFlow agent |
| 668 | installed on the switch. The OpenFlow agent would (directly or indirectly) |
| 669 | communicate with the Rocker switch driver, which in turn would program switch |
| 670 | hardware with flow functionality, as defined in OF-DPA. The block diagram is: |
| 671 | |
| 672 | +–––––––––––––––----–––+ |
| 673 | | OF | |
| 674 | | Remote Controller | |
| 675 | +––––––––+––----–––––––+ |
| 676 | | |
| 677 | | |
| 678 | +––––––––+–––––––––+ |
| 679 | | OF | |
| 680 | | Local Agent | |
| 681 | +––––––––––––––––––+ |
| 682 | | | |
| 683 | | Rocker Driver | |
| 684 | +––––––––––––––––––+ |
| 685 | <this spec> |
| 686 | +––––––––––––––––––+ |
| 687 | | | |
| 688 | | Rocker Switch | |
| 689 | +––––––––––––––––––+ |
| 690 | |
| 691 | To participate in flow functions, ports must be configure for OF-DPA mode |
| 692 | during switch initialization. |
| 693 | |
| 694 | OF-DPA Flow Table Interface |
| 695 | --------------------------- |
| 696 | |
| 697 | There are commands to add, modify, delete, and get stats of flow table entries. |
| 698 | The commands are issued using the DMA CMD descriptor ring. The following |
| 699 | commands are defined: |
| 700 | |
| 701 | CMD_ADD: add an entry to flow table |
| 702 | CMD_MOD: modify an entry in flow table |
| 703 | CMD_DEL: delete an entry from flow table |
| 704 | CMD_GET_STATS: get stats for flow entry |
| 705 | |
| 706 | TLVs for add and modify commands are: |
| 707 | |
| 708 | field width description |
| 709 | ---------------------------------------------------- |
| 710 | OF_DPA_CMD 2 CMD_[ADD|MOD] |
| 711 | OF_DPA_TBL 2 Flow table ID |
| 712 | 0: ingress port |
| 713 | 10: vlan |
| 714 | 20: termination mac |
| 715 | 30: unicast routing |
| 716 | 40: multicast routing |
| 717 | 50: bridging |
| 718 | 60: ACL policy |
| 719 | OF_DPA_PRIORITY 4 Flow priority |
| 720 | OF_DPA_HARDTIME 4 Hard timeout for flow |
| 721 | OF_DPA_IDLETIME 4 Idle timeout for flow |
| 722 | OF_DPA_COOKIE 8 Cookie |
| 723 | |
| 724 | Additional TLVs based on flow table ID: |
| 725 | |
| 726 | Table ID 0: ingress port |
| 727 | |
| 728 | field width description |
| 729 | ---------------------------------------------------- |
| 730 | OF_DPA_IN_PPORT 4 ingress physical port number |
| 731 | OF_DPA_GOTO_TBL 2 goto table ID; zero to drop |
| 732 | |
| 733 | Table ID 10: vlan |
| 734 | |
| 735 | field width description |
| 736 | ---------------------------------------------------- |
| 737 | OF_DPA_IN_PPORT 4 ingress physical port number |
| 738 | OF_DPA_VLAN_ID 2 (N) vlan ID |
| 739 | OF_DPA_VLAN_ID_MASK 2 (N) vlan ID mask |
| 740 | OF_DPA_GOTO_TBL 2 goto table ID; zero to drop |
| 741 | OF_DPA_NEW_VLAN_ID 2 (N) new vlan ID |
| 742 | |
| 743 | Table ID 20: termination mac |
| 744 | |
| 745 | field width description |
| 746 | ---------------------------------------------------- |
| 747 | OF_DPA_IN_PPORT 4 ingress physical port number |
| 748 | OF_DPA_IN_PPORT_MASK 4 ingress physical port number mask |
| 749 | OF_DPA_ETHERTYPE 2 (N) must be either 0x0800 or 0x86dd |
| 750 | OF_DPA_DST_MAC 6 (N) destination MAC |
| 751 | OF_DPA_DST_MAC_MASK 6 (N) destination MAC mask |
| 752 | OF_DPA_VLAN_ID 2 (N) vlan ID |
| 753 | OF_DPA_VLAN_ID_MASK 2 (N) vlan ID mask |
| 754 | OF_DPA_GOTO_TBL 2 only acceptable values are |
| 755 | unicast or multicast routing |
| 756 | table IDs |
| 757 | OF_DPA_OUT_PPORT 2 if specified, must be |
| 758 | controller, set zero otherwise |
| 759 | |
| 760 | Table ID 30: unicast routing |
| 761 | |
| 762 | field width description |
| 763 | ---------------------------------------------------- |
| 764 | OF_DPA_ETHERTYPE 2 (N) must be either 0x0800 or 0x86dd |
| 765 | OF_DPA_DST_IP 4 (N) destination IPv4 address. |
| 766 | Must be unicast address |
| 767 | OF_DPA_DST_IP_MASK 4 (N) IP mask. Must be prefix mask |
| 768 | OF_DPA_DST_IPV6 16 (N) destination IPv6 address. |
| 769 | Must be unicast address |
| 770 | OF_DPA_DST_IPV6_MASK 16 (N) IPv6 mask. Must be prefix mask |
| 771 | OF_DPA_GOTO_TBL 2 goto table ID; zero to drop |
| 772 | OF_DPA_GROUP_ID 4 data for GROUP action must |
| 773 | be an L3 Unicast group entry |
| 774 | |
| 775 | Table ID 40: multicast routing |
| 776 | |
| 777 | field width description |
| 778 | ---------------------------------------------------- |
| 779 | OF_DPA_ETHERTYPE 2 (N) must be either 0x0800 or 0x86dd |
| 780 | OF_DPA_VLAN_ID 2 (N) vlan ID |
| 781 | OF_DPA_SRC_IP 4 (N) source IPv4. Optional, |
| 782 | can contain IPv4 address, |
| 783 | must be completely masked |
| 784 | if not used |
| 785 | OF_DPA_SRC_IP_MASK 4 (N) IP Mask |
| 786 | OF_DPA_DST_IP 4 (N) destination IPv4 address. |
| 787 | Must be multicast address |
| 788 | OF_DPA_SRC_IPV6 16 (N) source IPv6 Address. Optional. |
| 789 | Can contain IPv6 address, |
| 790 | must be completely masked |
| 791 | if not used |
| 792 | OF_DPA_SRC_IPV6_MASK 16 (N) IPv6 mask. |
| 793 | OF_DPA_DST_IPV6 16 (N) destination IPv6 Address. Must |
| 794 | be multicast address |
| 795 | Must be multicast address |
| 796 | OF_DPA_GOTO_TBL 2 goto table ID; zero to drop |
| 797 | OF_DPA_GROUP_ID 4 data for GROUP action must |
| 798 | be an L3 multicast group entry |
| 799 | |
| 800 | Table ID 50: bridging |
| 801 | |
| 802 | field width description |
| 803 | ---------------------------------------------------- |
| 804 | OF_DPA_VLAN_ID 2 (N) vlan ID |
| 805 | OF_DPA_TUNNEL_ID 4 tunnel ID |
| 806 | OF_DPA_DST_MAC 6 (N) destination MAC |
| 807 | OF_DPA_DST_MAC_MASK 6 (N) destination MAC mask |
| 808 | OF_DPA_GOTO_TBL 2 goto table ID; zero to drop |
| 809 | OF_DPA_GROUP_ID 4 data for GROUP action must |
| 810 | be a L2 Interface, L2 |
| 811 | Multicast, L2 Flood, |
| 812 | or L2 Overlay group entry |
| 813 | as appropriate |
| 814 | OF_DPA_TUNNEL_LPORT 4 unicast Tenant Bridging |
| 815 | flows specify a tunnel |
| 816 | logical port ID |
| 817 | OF_DPA_OUT_PPORT 2 data for OUTPUT action, |
| 818 | restricted to CONTROLLER, |
| 819 | set to 0 otherwise |
| 820 | |
| 821 | Table ID 60: acl policy |
| 822 | |
| 823 | field width description |
| 824 | ---------------------------------------------------- |
| 825 | OF_DPA_IN_PPORT 4 ingress physical port number |
| 826 | OF_DPA_IN_PPORT_MASK 4 ingress physical port number mask |
| 827 | OF_DPA_ETHERTYPE 2 (N) ethertype |
| 828 | OF_DPA_VLAN_ID 2 (N) vlan ID |
| 829 | OF_DPA_VLAN_ID_MASK 2 (N) vlan ID mask |
| 830 | OF_DPA_VLAN_PCP 2 (N) vlan Priority Code Point |
| 831 | OF_DPA_VLAN_PCP_MASK 2 (N) vlan Priority Code Point mask |
| 832 | OF_DPA_SRC_MAC 6 (N) source MAC |
| 833 | OF_DPA_SRC_MAC_MASK 6 (N) source MAC mask |
| 834 | OF_DPA_DST_MAC 6 (N) destination MAC |
| 835 | OF_DPA_DST_MAC_MASK 6 (N) destination MAC mask |
| 836 | OF_DPA_TUNNEL_ID 4 tunnel ID |
| 837 | OF_DPA_SRC_IP 4 (N) source IPv4. Optional, |
| 838 | can contain IPv4 address, |
| 839 | must be completely masked |
| 840 | if not used |
| 841 | OF_DPA_SRC_IP_MASK 4 (N) IP Mask |
| 842 | OF_DPA_DST_IP 4 (N) destination IPv4 address. |
| 843 | Must be multicast address |
| 844 | OF_DPA_DST_IP_MASK 4 (N) IP Mask |
| 845 | OF_DPA_SRC_IPV6 16 (N) source IPv6 Address. Optional. |
| 846 | Can contain IPv6 address, |
| 847 | must be completely masked |
| 848 | if not used |
| 849 | OF_DPA_SRC_IPV6_MASK 16 (N) IPv6 mask |
| 850 | OF_DPA_DST_IPV6 16 (N) destination IPv6 Address. Must |
| 851 | be multicast address. |
| 852 | OF_DPA_DST_IPV6_MASK 16 (N) IPv6 mask |
| 853 | OF_DPA_SRC_ARP_IP 4 (N) source IPv4 address in the ARP |
| 854 | payload. Only used if ethertype |
| 855 | == 0x0806. |
| 856 | OF_DPA_SRC_ARP_IP_MASK 4 (N) IP Mask |
| 857 | OF_DPA_IP_PROTO 1 IP protocol |
| 858 | OF_DPA_IP_PROTO_MASK 1 IP protocol mask |
| 859 | OF_DPA_IP_DSCP 1 DSCP |
| 860 | OF_DPA_IP_DSCP_MASK 1 DSCP mask |
| 861 | OF_DPA_IP_ECN 1 ECN |
| 862 | OF_DPA_IP_ECN_MASK 1 ECN mask |
| 863 | OF_DPA_L4_SRC_PORT 2 (N) L4 source port, only for |
| 864 | TCP, UDP, or SCTP |
| 865 | OF_DPA_L4_SRC_PORT_MASK 2 (N) L4 source port mask |
| 866 | OF_DPA_L4_DST_PORT 2 (N) L4 source port, only for |
| 867 | TCP, UDP, or SCTP |
| 868 | OF_DPA_L4_DST_PORT_MASK 2 (N) L4 source port mask |
| 869 | OF_DPA_ICMP_TYPE 1 ICMP type, only if IP |
| 870 | protocol is 1 |
| 871 | OF_DPA_ICMP_TYPE_MASK 1 ICMP type mask |
| 872 | OF_DPA_ICMP_CODE 1 ICMP code |
| 873 | OF_DPA_ICMP_CODE_MASK 1 ICMP code mask |
| 874 | OF_DPA_IPV6_LABEL 4 (N) IPv6 flow label |
| 875 | OF_DPA_IPV6_LABEL_MASK 4 (N) IPv6 flow label mask |
| 876 | OF_DPA_GROUP_ID 4 data for GROUP action |
| 877 | OF_DPA_QUEUE_ID_ACTION 1 write the queue ID |
| 878 | OF_DPA_NEW_QUEUE_ID 1 queue ID |
| 879 | OF_DPA_VLAN_PCP_ACTION 1 write the VLAN priority |
| 880 | OF_DPA_NEW_VLAN_PCP 1 VLAN priority |
| 881 | OF_DPA_IP_DSCP_ACTION 1 write the DSCP |
| 882 | OF_DPA_NEW_IP_DSCP 1 new DSCP |
| 883 | OF_DPA_TUNNEL_LPORT 4 restrct to valid tunnel |
| 884 | logical port, set to 0 |
| 885 | otherwise. |
| 886 | OF_DPA_OUT_PPORT 2 data for OUTPUT action, |
| 887 | restricted to CONTROLLER, |
| 888 | set to 0 otherwise |
| 889 | OF_DPA_CLEAR_ACTIONS 4 if 1 packets matching flow are |
| 890 | dropped (all other instructions |
| 891 | ignored) |
| 892 | |
| 893 | TLVs for flow delete and get stats command are: |
| 894 | |
| 895 | field width description |
| 896 | --------------------------------------------------- |
| 897 | OF_DPA_CMD 2 CMD_[DEL|GET_STATS] |
| 898 | OF_DPA_COOKIE 8 Cookie |
| 899 | |
| 900 | On completion of get stats command, the descriptor buffer is written back with |
| 901 | the following TLVs: |
| 902 | |
| 903 | field width description |
| 904 | --------------------------------------------------- |
| 905 | OF_DPA_STAT_DURATION 4 Flow duration |
| 906 | OF_DPA_STAT_RX_PKTS 8 Received packets |
| 907 | OF_DPA_STAT_TX_PKTS 8 Transmit packets |
| 908 | |
| 909 | Possible status return codes in descriptor on completion are: |
| 910 | |
| 911 | DESC_COMP_ERR command reason |
| 912 | -------------------------------------------------------------------- |
| 913 | 0 all OK |
| 914 | -ROCKER_EFAULT all head or tail index outside |
| 915 | of ring |
| 916 | -ROCKER_ENXIO all address or data read err on |
| 917 | desc buf |
| 918 | -ROCKER_EMSGSIZE GET_STATS cmd descriptor buffer wasn't |
| 919 | big enough to contain write-back |
| 920 | TLVs |
| 921 | -ROCKER_EINVAL all invalid parameters passed in |
| 922 | -ROCKER_EEXIST ADD entry already exists |
| 923 | -ROCKER_ENOSPC ADD no space left in flow table |
| 924 | -ROCKER_ENOENT MOD|DEL|GET_STATS cookie invalid |
| 925 | |
| 926 | Group Table Interface |
| 927 | --------------------- |
| 928 | |
| 929 | There are commands to add, modify, delete, and get stats of group table |
| 930 | entries. The commands are issued using the DMA CMD descriptor ring. The |
| 931 | following commands are defined: |
| 932 | |
| 933 | CMD_ADD: add an entry to group table |
| 934 | CMD_MOD: modify an entry in group table |
| 935 | CMD_DEL: delete an entry from group table |
| 936 | CMD_GET_STATS: get stats for group entry |
| 937 | |
| 938 | TLVs for add and modify commands are: |
| 939 | |
| 940 | field width description |
| 941 | ----------------------------------------------------------- |
| 942 | FLOW_GROUP_CMD 2 CMD_[ADD|MOD] |
| 943 | FLOW_GROUP_ID 2 Flow group ID |
| 944 | FLOW_GROUP_TYPE 1 Group type: |
| 945 | 0: L2 interface |
| 946 | 1: L2 rewrite |
| 947 | 2: L3 unicast |
| 948 | 3: L2 multicast |
| 949 | 4: L2 flood |
| 950 | 5: L3 interface |
| 951 | 6: L3 multicast |
| 952 | 7: L3 ECMP |
| 953 | 8: L2 overlay |
| 954 | FLOW_VLAN_ID 2 Vlan ID (types 0, 3, 4, 6) |
| 955 | FLOW_L2_PORT 2 Port (types 0) |
| 956 | FLOW_INDEX 4 Index (all types but 0) |
| 957 | FLOW_OVERLAY_TYPE 1 Overlay sub-type (type 8): |
| 958 | 0: Flood unicast tunnel |
| 959 | 1: Flood multicast tunnel |
| 960 | 2: Multicast unicast tunnel |
| 961 | 3: Multicast multicast tunnel |
| 962 | FLOW_GROUP_ACTION nest |
| 963 | FLOW_GROUP_ID 2 next group ID in chain (all |
| 964 | types except 0) |
| 965 | FLOW_OUT_PORT 4 egress port (types 0, 8) |
| 966 | FLOW_POP_VLAN_TAG 1 strip outer VLAN tag (type 1 |
| 967 | only) |
| 968 | FLOW_VLAN_ID 2 (types 1, 5) |
| 969 | FLOW_SRC_MAC 6 (types 1, 2, 5) |
| 970 | FLOW_DST_MAC 6 (types 1, 2) |
| 971 | |
| 972 | TLVs for flow delete and get stats command are: |
| 973 | |
| 974 | field width description |
| 975 | ----------------------------------------------------------- |
| 976 | FLOW_GROUP_CMD 2 CMD_[DEL|GET_STATS] |
| 977 | FLOW_GROUP_ID 2 Flow group ID |
| 978 | |
| 979 | On completion of get stats command, the descriptor buffer is written back with |
| 980 | the following TLVs: |
| 981 | |
| 982 | field width description |
| 983 | --------------------------------------------------- |
| 984 | FLOW_GROUP_ID 2 Flow group ID |
| 985 | FLOW_STAT_DURATION 4 Flow duration |
| 986 | FLOW_STAT_REF_COUNT 4 Flow reference count |
| 987 | FLOW_STAT_BUCKET_COUNT 4 Flow bucket count |
| 988 | |
| 989 | Possible status return codes in descriptor on completion are: |
| 990 | |
| 991 | DESC_COMP_ERR command reason |
| 992 | -------------------------------------------------------------------- |
| 993 | 0 all OK |
| 994 | -ROCKER_EFAULT all head or tail index outside |
| 995 | of ring |
| 996 | -ROCKER_ENXIO all address or data read err on |
| 997 | desc buf |
| 998 | -ROCKER_ENOSPC GET_STATS cmd descriptor buffer wasn't |
| 999 | big enough to contain write-back |
| 1000 | TLVs |
| 1001 | -ROCKER_EINVAL ADD|MOD invalid parameters passed in |
| 1002 | -ROCKER_EEXIST ADD entry already exists |
| 1003 | -ROCKER_ENOSPC ADD no space left in flow table |
| 1004 | -ROCKER_ENOENT MOD|DEL|GET_STATS group ID invalid |
| 1005 | -ROCKER_EBUSY DEL group reference count non-zero |
| 1006 | -ROCKER_ENODEV ADD next group ID doesn't exist |
| 1007 | |
| 1008 | |
| 1009 | |
| 1010 | References |
| 1011 | ========== |
| 1012 | |
| 1013 | [1] OpenFlow Data Plane Abstraction (OF-DPA) Abstract Switch Specification, |
| 1014 | Version 1.0, from Broadcom Corporation, February 21, 2014. |