Debugging Guide
Debugging hardware accelerators requires different strategies for simulation versus FPGA deployment. This guide covers the tools and techniques available in Beethoven for finding and fixing issues at each stage.
Simulation Debugging
Waveform Generation
Beethoven's simulation backends automatically generate waveforms for debugging. The simulator type determines the waveform format:
- Verilator
- VCS
- Icarus Verilog
cd Beethoven-Runtime/build
cmake .. -DTARGET=sim -DSIMULATOR=verilator
make -j
./BeethovenRuntime
Verilator generates dump.vcd in the working directory. View with GTKWave:
gtkwave dump.vcd
VCD files can grow very large for long simulations. Consider limiting simulation time or using selective signal dumping.
cd Beethoven-Runtime/build
cmake .. -DTARGET=sim -DSIMULATOR=vcs
make -j
../scripts/build_vcs.sh
./BeethovenTop
VCS generates FSDB or VPD waveforms. Always use the finish command to properly flush waveforms:
# In VCS shell after CTRL+C
finish
Never kill VCS with SIGKILL. Always use finish to ensure waveform files are properly written.
cd Beethoven-Runtime
make sim_icarus
Icarus generates VCD waveforms. Use finish to properly close:
# After CTRL+C
finish
Signal Tracing Best Practices
- Start Narrow: Only trace signals in suspect modules initially
- Use Hierarchy: Navigate module hierarchy in waveform viewer to find signals
- Add Markers: Mark cycle boundaries and key events (command issue, response return)
- Compare Expected vs Actual: Run reference implementation alongside accelerator
- Check Handshakes: Verify
valid/readyprotocol compliance on all interfaces
Memory Debugging
Beethoven integrates DRAMsim3 for cycle-accurate DRAM simulation. To debug memory timing issues:
# VPI-based simulators (VCS, Icarus)
cmake .. -DDRAMSIM_CONFIG=DDR4_8Gb_x16_3200.ini
# Verilator
./BeethovenRuntime -dramconfig DDR4_8Gb_x16_3200.ini
DRAMsim3 configurations available:
DDR4_8Gb_x16_3200.ini- Default (3200 MT/s)DDR4_8Gb_x8_2400.ini- Slower DDR4 (2400 MT/s)DDR3_8Gb_x8_1600.ini- DDR3 variant
If your accelerator performs well in simulation but poorly on FPGA, try slower DRAMsim3 configs to identify memory bottlenecks.
Software/Hardware Co-Debugging
Using Printf Debugging
Add debug prints to your testbench to correlate with waveform events:
#include <iostream>
#include <chrono>
auto start = std::chrono::steady_clock::now();
auto resp = myCore::process(0, input_ptr);
std::cout << "[T+" << std::chrono::duration_cast<std::chrono::microseconds>(
std::chrono::steady_clock::now() - start).count()
<< "µs] Command issued\n";
auto result = resp.get();
std::cout << "[T+" << std::chrono::duration_cast<std::chrono::microseconds>(
std::chrono::steady_clock::now() - start).count()
<< "µs] Response received\n";
Search for these timestamps in your waveform to quickly locate events.
Response Timeouts
If responses never arrive, check:
- Protocol Compliance: Ensure
req.readyis high when accepting commands - Response Valid: Verify
resp.validis driven high after processing - Deadlock: Check for circular dependencies in module state machines
- Memory Initialization: Verify scratchpads are initialized before use
auto resp = myCore::process(0, ptr);
auto result = resp.try_get();
if (!result.has_value()) {
std::cerr << "Response not ready - accelerator may be stalled\n";
// Check waveform for state machine or memory issues
}
FPGA Debugging
Build Failures
When synthesis or place-and-route fails:
1. Check Timing Reports
# Kria/Vivado
vivado post_route.dcp
report_timing_summary
Look for:
- Negative slack (timing violations)
- High fanout nets
- SLR crossing violations on multi-die FPGAs
2. Resource Utilization
# In Vivado
report_utilization
Check if you've exceeded:
- LUTs, FFs, BRAMs, URAMs
- DSP blocks
- Clock regions (for Kria)
3. AWS F2 AFI Build Failures
AWS stores detailed build logs in S3:
aws s3 cp s3://<your-bucket>/<logs-folder>/ . --recursive
Common issues:
- Routing congestion → Reduce design size or improve floorplanning
- Timing closure → Lower clock frequency or pipeline critical paths
- Resource exhaustion → Check utilization per SLR
AWS AFI builds may report success but produce non-functional AFIs. Always test with a known-good workload after loading.
Runtime Debugging on FPGA
Limited Hardware Debugging Support:
Beethoven does not currently integrate Vivado ILA (Integrated Logic Analyzer) or VIO (Virtual I/O) cores. For hardware debugging:
- Add Simulation First: Reproduce the issue in simulation where waveforms are available
- Use Response Payloads: Return debug data through
AccelResponsepayloads - External Vivado Debug: Manually insert ILA cores in generated Verilog
- Iterative Refinement: Add debug outputs to your Chisel, rebuild, and redeploy
Inserting ILA Manually
After Beethoven generates Verilog, you can insert ILA cores:
# In synth.tcl or during implementation
create_debug_core ila_core ila
set_property C_DATA_DEPTH 4096 [get_debug_cores ila_core]
connect_debug_port ila_core/clk [get_nets my_clock]
connect_debug_port ila_core/probe0 [get_nets {my_signal[*]}]
This requires re-running synthesis after Beethoven generation.
Debug AXI Cache/Prot Signals
Some platforms support debugging AXI protocol signals. Currently only AUPZU3Platform has this enabled:
override val hasDebugAXICACHEPROT = true
When enabled, the host can override AXI CACHE and PROT signal values for debugging cache coherency issues.
Common Issues and Solutions
Issue: Accelerator Hangs (No Response)
Symptoms: resp.get() never returns, testbench hangs
Debug Steps:
- Check waveform for
resp.validsignal - is it ever asserted? - Verify state machine in accelerator reaches "response" state
- Look for deadlocks in memory interfaces (e.g., waiting for
readythat never comes) - Check if command was properly decoded (inspect
req.bitsin waveform)
Common Causes:
- Forgot to drive
resp.valid := true.Bin response state - Memory reader/writer stuck waiting for initialization
- Circular dependency in state machine transitions
Issue: Data Corruption (Wrong Results)
Symptoms: Results don't match expected values
Debug Steps:
- Add reference implementation in software, compare results
- Check memory alignment - addresses must align to
dataBytes - Verify
copy_to_fpga()/copy_from_fpga()are called on discrete platforms - Inspect waveform to see actual data values on memory interfaces
- Check for race conditions in concurrent memory accesses
Common Causes:
- Misaligned addresses (see Memory Alignment)
- Forgot to call
copy_to_fpga()before accelerator execution - Endianness mismatch between host and accelerator
- Off-by-one errors in address calculation
Issue: Timing Violations (Won't Meet Timing)
Symptoms: Post-route timing report shows negative slack
Debug Steps:
- Identify critical path in timing report
- Check if path crosses SLRs (multi-die FPGAs)
- Look for high-fanout nets or long combinational chains
- Review floorplanning constraints (are modules placed optimally?)
Solutions:
- Add pipeline stages on critical paths
- Use
RegNext()to break combinational chains - Improve SLR placement for multi-die FPGAs
- Reduce clock frequency if architecture can't meet timing
- Use
LazyModuleWithFloorplanto control placement
Issue: Simulation Works, FPGA Fails
Symptoms: Perfect behavior in Verilator/VCS, failures on real hardware
Debug Steps:
- Test with more realistic DRAM model (slower DRAMsim3 config)
- Check for uninitialized registers (simulation often defaults to 0)
- Look for metastability issues (clock domain crossings)
- Verify reset behavior (reset may take longer on FPGA)
Common Causes:
- Insufficient DRAM bandwidth (simulation DRAM is often unrealistic)
- Uninitialized
Reg()- always useRegInit() - Clock domain crossing without proper synchronization
- Race conditions exposed by different timing on FPGA
Issue: Build Takes Forever
Symptoms: Vivado synthesis/P&R runs for hours
Solutions:
- Enable incremental compilation in Vivado
- Use hierarchical synthesis for large designs
- Simplify floorplanning constraints (over-constraining slows P&R)
- For AWS F2: Use smaller instance types for builds (c5.4xlarge is usually sufficient)
Debugging Checklist
Before filing a bug report or asking for help:
- Reproduced in simulation with waveforms
- Checked protocol compliance on all
BeethovenIOinterfaces - Verified memory alignment (address and length align to
dataBytes) - Confirmed
copy_to_fpga()/copy_from_fpga()are called - Reviewed generated
beethoven_hardware.hmatches expectations - Tested with reference software implementation
- Checked timing reports (FPGA builds only)
- Verified resource utilization is within limits
See Also
- Memory Interfaces - Alignment and protocol requirements
- Host Interface - Command/response protocol
- Example - Complete working example with state machine
- Floorplanning - Multi-die timing optimization