Multi-agent collaborative perception has been proposed to leverage the viewpoints of other agents to improve the detection accuracy compared with the individual viewpoint. Recent research has shown the effectiveness of early, late, and intermediate fusion of collaborative perception, which respectively transmits raw data, output bounding boxes, and intermediate features, and the improved collaborative perception results will benefit the self-driving decisions of connected and autonomous vehicles (CAVs).