Altera FPGA PCI Express core with Chaining DMA
Introduction
Field programmable gate arrays (FPGA) more and more often come with a PCI Express implementation, either a soft core (i.e. as programmable logic) or as a hard core (i.e. as silicon on-chip). Altera's PCI Express Megacore can be used to instantiate such a core for most FPGA's such as Cyclone II, III, Arria I, II, Stratix II, III or IV. It uses either the on-board transceivers or an external PCIe PHY chip.
The Megacore will instantiate an example end point reference design called "Chaining DMA" which includes a small on-chip memory, as well as a chaining DMA controller that fetches a descriptor table from Root Complex memory and performs the DMA copies in the table.
Driver
This Linux device driver controls the Chaining DMA application and acts as a working reference design.
Progress
To Do
- Convert current iteration of static DMA to a more generic implementation of variable DMA.
Finished
Boards
PCI Express core configuration
- The current driver-in-development is targetting cores generated with the PCI Express Compiler version 8.0. Goal is to detect and support newer versions.
- The core can be configured as "Legacy" or "Native" PCI Express End Point. Goal is to support both options.
- BAR[0] size must be configured for 32kiB or more.
- BAR[2] size must be configured for 256 bytes or more. This is where the Root Complex (i.e. CPU) memory address of the DMA descriptor tables is written and where the DMA is initialized.
- BAR address sizes can be 32-bit or 64-bit. Goal is to properly detect and support both.
- The Device ID and Vendor ID must be unchanged, i.e. 0x???? and 0x????. You may change the Sub system vendor and device ID to your liking.
Documentation
DMA Header (in End Point memory BAR[2])
| address |
field |
DMA Read or Write |
comment |
| 0x00 |
Global Control & Number of Descriptors |
W |
| 0x04 |
Bus Address (upper) of Descriptor Table |
W |
Points to a table in Root Complex memory |
| 0x08 |
Bus Address (lower) of Descriptor Table |
W |
| 0x0c |
Reserved & Last Descriptor Available (RCLAST) |
W |
RCLAST = 0 means descriptor #0 is ready for processing by the End Point |
| 0x10 |
Global Control & Number of Descriptors |
R |
| 0x14 |
Bus Address (upper) of Descriptor Table |
R |
| 0x18 |
Bus Address (lower) of Descriptor Table |
R |
| 0x1c |
Reserved & Last Descriptor Available (RCLAST) |
R |
RCLAST = 0 means descriptor #0 is ready to be acted-upon by the End Point |
Notes:
- The fields must be written by DWORD writes, i.e. in Linux use iowrite32().
- The fields only have write access. Reading from these address will return a PCIe error (this can hang your system!).
- Writing to the 0x0c or 0x1c location starts the corresponding DMA operation.
- Does the design support DMA read and write operation concurrently?
- The Root Complex may increment RCLAST during the DMA transfer (this is not tested yet).
DMA Table (in Root Complex memory)
Each Table starts with four 32-bits words (16 bytes) in which the DMA controller will write its progress, followed by an array of descriptors, each four 32-bits words (16 bytes) in size.
| address |
field |
Access |
comment |
| 0x00 |
Reserved |
R/W |
| 0x04 |
Reserved |
R/W |
| 0x08 |
Reserved |
R/W |
| 0x0c |
Reserved & Last Descriptor Completed (EPLAST) |
R/W |
| 0x10 |
Control & Transfer Length (DWORDS or bytes??) |
R/W |
Descriptor #0 |
| 0x14 |
End Point address |
R/W |
Descriptor #0 |
| 0x18 |
Bus Address (msb) for Root Complex memory |
R/W |
Descriptor #0 |
| 0x1c |
Bus Address (msb) for Root Complex memory |
R/W |
Descriptor #0 |
| 0x20 |
Control & Transfer Length (DWORDS or bytes??) |
R/W |
Descriptor #1 |
| 0x24 |
End Point address |
R/W |
Descriptor #1 |
| 0x28 |
Bus Address (msb) for Root Complex memory |
R/W |
Descriptor #1 |
| 0x2c |
Bus Address (msb) for Root Complex memory |
R/W |
Descriptor #1 |
| ... |
... |
... |
... |
Notes:
- Rumor is the total table may not exceed 4096 bytes or cross 4096 boundaries. pci_alloc_consistent(..., 4096, ...) will do that for us, on most platforms.
- 4096 bytes gives 255 descriptors. Suppose that each descriptor describes a 4096 byte copy, this gives 255 * 4096 is just a little less of 1 MiB? per DMA operation.
Linux Kernel API's
Scatterlists
Kernel configuration (Kconfig) entry
config ALTPCIECHDMA
tristate "Altera PCI Express Chaining DMA Test Driver"
---help---
The Altera PCIe Chaining DMA test driver will perform tests against
FPGA/ASIC devices that have Altera's PCI Express core with the
Chaining DMA application generated by the Megacore.
Devices range from Cyclone II FPGA with soft PCIe IP core up to a
Stratix IV with a silicon PCIe core.
This driver controls the DMA engine by performing DMA transfers
in loop-back fashion and doing memory compares to verify the loop-
back was succesfull.
The driver acts as a test driver to verify your PCIe core. It may
be used as a basis for your custom logic.
Kernel Makefile entry
obj-$(ALTPCIECHDMA) += altpciechdma.c