Little notes on things that don't really deserve a long writeup.
Reported here: Xilinx developer forum. Unfortunately, no Xilinx dev or forum administrator has confirmed this issue, but the minimal example shows how this bug can be replicated. You can download the minimal example by clicking here.
Small program that demonstrates HLS data-types behaviour with ap_int and ap_uint data-types:
#include <stdio.h> #include "ap_int.h"int main() { ap_int<8> acc = 12; ap_uint<8> res = 0; res += acc; printf("res = %d\n",(int)res); acc = -6; res += acc; printf("res = %d\n",(int)res); acc = -8; res += acc; printf("res = %d\n",(int)res); return 0; }
Output result is:
res = 12 res = 6 res = 254
Conclusion: The accummulation happens correctly in signed integer, but the number is stored and printed as an unsigned integer.
Wow, this was hard to setup. Some steps that are not documented, especially for allowing a TW client on a different machine to sync with the server (e.g. some server on the VPN):
Very useful for speeding up tedious tasks, e.g. creating verilog testbenches requires lots of wire declarations when connecting up the interface for the DUT. e.g. module Top ( a, b, c ), in the testbench we want to instantiate Top DUT ( .a(a), .b(b), .c(c) ), which can be done by a single well-constructed regex command. In vim-regex, to capture a matched string, you can do a visual select and use %s/\(YOUR_REGEX_TO_MATCH\)/\.\1(\1)/g. Here, your regex match is capturing groups, and you are back-referencing it using \1.
Another example: "James Bond" use s/\(.*\) \(.*\)/I'm \2, \1 \2/g, which will produce "I'm Bond, James Bond".
My dotfiles: https://github.com/sidmontu/dotfiles.
I followed this tutorial: https://www.atlassian.com/git/tutorials/dotfiles
On a new system, inside home directory, run:
PYNQ -- Holy grail of the PYNQ software framework.
UG902 -- High-level synthesis (details about HLS data-types, IP cores, pragmas, more)
WP509 -- RF data convert IP block, and understanding key parameters it exposes to the programmer.
DS926 -- RFSoC data sheet
PG269 -- RF data converter IP data sheet
UG1270 -- VivadoHLS optimization guide (pragmas and their effects mainly).
VivadoHLS pragma list -- HTML page of HLS pragmas, easier to read and navigate than the PDFs
PG085 -- AXI4 stream protocol IP suite for Xilinx.
RFSoC -- getting started guide (setup, tests, etc)
If you want to use the Xilinx FFT IP core in VivadoHLS, where you're repeatedly calling the HLS core function to evaluate on different vectors of data (not unrolled, since you might not want to instantiate many copies of the FFT core in hardware), then the HLS tool will give an incompatibility error on the interface data-type (ap_auto vs ap_fifo). To solve that, insert HLS interface pragma to declare the input ports of the FFT IP core as type ap_fifo. See: https://forums.xilinx.com/t5/Vivado-High-Level-Synthesis-HLS/ap-auto-incompatible-with-ap-fifo-error-when-using-FFT-IP/td-p/703730.
In VivadoHLS, if you wish to interpret an ap_uint/ap_int data-type as fixed-point data-type, i.e. you want the ap_fixed number to have the bit-exact value as the ap_uint/ap_int number, then you can use the .V statement. E.g. ap_fixed<8,6> fxp_num is a Q6.2 fixed-point number, and ap_uint<8> uint_num = 0x0f, by setting "fxp_num.V = uint_num", fxp_num bits will be set to 0x0f, which will be interpreted as 3.75. See: https://forums.xilinx.com/t5/Vivado-TCL-Community/A-question-about-Type-Conversion-in-Vivado-HLS/td-p/711413
No Clock-Domain-Crossing (CDC) in VivadoHLS. Which is weird, since you can specify a separate clock for your s_axilite ports (e.g. #pragma HLS interface port=axilite_port clock=config_clk). Either it's broken, or I need to investigate further whether the timing errors reported on the path can actually be ignored as false paths. See https://forums.xilinx.com/t5/Vivado-High-Level-Synthesis-HLS/no-clock-domain-crossing-in-HLS/td-p/933543.
How to connect interrupts in block design (or how not to): https://forums.xilinx.com/t5/Embedded-Linux/A-key-error-when-I-am-trying-to-access-a-DMA-IP-on-PYNQ-Z1-board/td-p/847817
For some obscure reason, if Xilinx tools (e.g. Vivado) is in your $PATH, some C/C++ projects report build errors - I observed this error from trying to build Ettus RFNoC framework using PyBOMBs. Removing Xilinx tools from the $PATH in a fresh terminal does the trick.
Pynq Jupyter environment sometimes fails to load even the basic pynq overlay modules. If you see a "ValueError: bad marshal data", then perhaps there are corrupt compiled python module files (.pyc). I've fixed this once by running "(sudo) find /usr -name '*.pyc' -delete". Stackoverflow link.
Interesting little feature in Python: instead of a dynamic __dict__ of attributes, __slots__ is a statically declared dictionary you know your class instances will have, i.e. not mutable at runtime. Effect is you get better runtime performance (faster lookup), and savings in memory usage. See: Stackoverflow discussion
If you have a very large dataset that is dynamically-loaded from disk - i.e. if you keep track of pointers in your get_data() routine to yield data to the trainer at runtime, remember to reset these pointers appropriately at the beginning of the get_data() routine. This is because Tensorpack spawns multiple threads for feeding the Tensorpack trainer at runtime, and without a "reset_state()" function will cause out-of-range memory access errors.
Pblock is a physical block to constrain nets to physical areas on the FPGA. Pblocks can be nested, but Xilinx recommend at most one-level of nesting, and to avoid it altogether if possible. Pblock can be shaped into custom shapes (instead of just a rectangle) by clicking on the "add pblock rectangle" option when selecting the pblock you wish to reshape. The pblock properties can be exported to a xdc constraints file automatically using "save constraints as" option under "File". These constraints are basically tcl commands that look something like the following:
create_pblock "name you wish to give to the pblock"
add_cells_to_pblock [get_pblocks "name of pblock"] [get_cells -quiet [list "name of module you wish to assign to pblock"]
resize_pblock [get_pblocks "name of pblock"] -add {SLICE_"XY-COORD":SLICE_"XY-COORD"}
resize_pblock [get_pblocks "name of pblock"] -add {DSP48_"XY-COORD":DSP48_"XY-COORD"}
resize_pblock [get_pblocks "name of pblock"] -add {RAMB18_"XY-COORD":RAMB18_"XY-COORD"}
resize_pblock [get_pblocks "name of pblock"] -add {RAMB36_"XY-COORD":RAMB36_"XY-COORD"}
You can do a DRC check for floorplanning to make sure pblock specified for the module meets expectations. Basically, this is a sanity test.
Vivado has an auto-floorplanning tool as well (under Tools - Floorplanning - Auto-create Pblocks). Also worth checking out "Window - Phyiscal Constraints" tab, which will show connectivity between different Pblocks for better visualization of dataflow in your circuit. Individual logic can also be locked into exact locations so that there is consistency between runs (i.e. you don't get a completely different placed/routed solution that you can't quantify whether your changes made any difference between implementation runs).
Floorplanning can improve performance and consistency between runs. Some guidelines: floorplan what is necessary, do not over-floorplan. Choose modules to floor-plan that do not have connectivity to lots of other modules, as those modules are better to be broken apart to reduce critical path lengths. Use RTL hierarchy, DRC checks, properties and other Vivado tools to judge what is best and how to floorplan. Finally, floorplanning can be an iterative process.
Reference resource: https://www.xilinx.com/video/hardware/design-analysis-floorplanning-with-vivado.html
Weird thing I found that if, instead of using urxvt as the main terminal, you are using termite, then there is a weird idiosyncrasy with how a bspc rule has to be declared. In fact, it seems like urxvt also has an oddity from a stackoverflow post that I saw
Instead of :
bspc rule -a termite state=floating
bspc rule -a urxvt state=floating
You must declare the following (basically the capitalization) :
bspc rule -a Termite state=floating
bspc rule -a URxvt state=floating
for the rule to register when creating the window. By the way, what's with the weird names (sxhkd, urxvt)?
Problems: panel does not launch automatically on startx.
Tested with Vivado(HLS) 2017.4
For reference: https://www.youtube.com/watch?v=Dupyek4NUoI
For reference: Pynq-Z1 board part number is xc7z020clg400-1
For reference: Pynq-Z1 board files from: https://github.com/cathalmccabe/pynq-z1_board_files (installation instructions in the .md file in repo)
Steps (roughly) :
1) Easiest way is to start with VivadoHLS, write your own HLS module that interfaces with the host CPU. Add HLS INTERFACE pragmas to define the I/O ports and their types. e.g. #pragma HLS INTERFACE s_axilite port= Obscure artifact #1: Interrupts from AXI DMA block have to be handled by an AXI Interrupt controller, instead of simply concat into IRQ_F2P port of Zynq-PS. See https://forums.xilinx.com/t5/Embedded-Linux/A-key-error-when-I-am-trying-to-access-a-DMA-IP-on-PYNQ-Z1-board/td-p/847817 Mostly grabbed from a great semi-technical summary in a reddit comment, Source: https://www.reddit.com/r/explainlikeimfive/comments/9cbi8p/eli5_what_is_the_fast_fourier_transform_or_fft/e5axg1n/
https://www.tensorflow.org/guide/graphs#visualizing_your_graph To see the graph of the model you're training, run tensorbord --logdir From: http://venge.net/graydon/talks/CompilerTalk-2019.pdf Frances Allen "A Catalogue of Optimizing Transformations" (1971 paper). Write 8 passes to get ~80% best-case performance (used by many modern compilers). They are:
2) Run C synthesis, make sure there are no errors and you understand the warnings.
3) Click button for "Export RTL". We can use default settings and click ok. This will package your design as an IP. Choose the language you prefer as well (I go with Verilog mostly).
4) This is the first interesting part to pay attention to: Under "solution1" folder (assuming you're using the default solution name for Vivado synthesis process), go to impl>misc>drivers>"your ip name">src and open x"ip name"_hw.h. Make a note of all the addresses of each register that is being assigned to your ports. You'll need this later to be able to write your software Python drivers that send data to these memory-mapped AXI ports.
5) Switch over to Vivado.
AXI-STREAM interface on Pynq (with DMA)
DFT/FFT footnotes
Tensorflow visualizing the graph
Compilers
Notes from Caltech lecture on SVMs: https://www.youtube.com/watch?v=eHsErlPJWUU
Notes from Caltech lecture on kernel methods: https://www.youtube.com/watch?v=XUj5JbQihlU