ZipAccel-C
GZIP/ZLIB/Deflate Data Compressor

ZipAccel-C is a custom hardware implementation of a lossless data compression engine that complies with the Deflate, GZIP, and ZLIB compression standards. 

The core receives uncompressed input files and produces compressed files. No post-processing of the compressed files is required, as the core encapsulates the compressed data payload with the proper headers and footers. Input files can be segmented, and segments from different files can be interleaved at the core’s input.
  
The core’s flexible architecture enables fine-tuning of its compression efficiency, throughput, and latency to match the requirements of the end application. Throughputs in excess of 400 Gbps are feasible even at clock rates as low as 500MHz, and latency can be as small as a few tens of clock cycles.
 
ZipAccel-C offers compression efficiency practically equivalent to today’s popular deflate-based software applications. Analyzing processing speed versus compression efficiency to achieve the best trade-off for a specific system is facilitated by the included software model, and by support from our team of data compression experts. 

ZipAccel-C has been designed for ease of use and integration. It operates on a standalone basis, off-loading the host CPU from the demanding task of data compression, and optionally from the task of encrypting the compressed stream. Streaming AXI-Stream or native FIFO-like data interfaces ease SoC integration.

Technology mapping is straightforward, as the design is LINT-clean, scan-ready, microcode-free, and uses easily replaceable, generic memory models. Memory blocks can optionally support Error Correction Codes (ECC) to simplify achieving Enterprise-Class reliability requirements. Furthermore, input file segmentation can limit the inter-file latency and helps users achieve Quality of Service (QoS) objectives. 
 

Support

The core as delivered is warranted against defects for ninety days from purchase. Thirty days of phone and email technical support are included, starting with the first interaction. Additional maintenance and support options are available.

Deliverables

The core is available in synthesizable HDL (System Verilog) and targeted FPGA netlist forms, and includes everything required for successful implementation. It's deliverables include:

  • Sophisticated Test Environment
  • Simulation scripts, test vectors, and expected results
  • Synthesis script
  • Comprehensive user documentation

ZipAccel-C reference designs have been evaluated in a variety of technologies. ZipAccel-C performance can scale by instantiating more search engines and/or Huffman encoders. Furthermore, other design options, such as the search area window, affect the silicon resources utilization. 

Over 400 Gbps throughputs are feasible at clock rates as low as 500MHz, and the silicon footprint can be less than 100KGates. Contact CAST Sales for help defining likely configuration options and estimating implementation results for your specific system.

The ZipAccel-C can be mapped to any AMD FPGA device (provided sufficient silicon resources are available). ZipAccel-C performance can scale by instantiating more search engines and/or Huffman encoders. Furthermore, other design options, such as the search area window, affect the silicon resources utilization. The following table provides sample AMD results for a subset of the possible configuration options. They do not represent the smallest possible area requirements nor the highest possible clock frequency. Please contact CAST to get characterization data for your target configuration and technology.

Family / Device Configuration LUTs RAM Blocks
Kintex UltraScale+
ku9p-1-e
1 Systolic Search Engine,1 Static Huffman Encoder, 512B History Window, 450MHz 5,715 1
Artix UltraScale+
au25p-1-e
1 Hash Search Engine, 1 Dynamic Huffman Encoder, 4kB History Window, 180MHz 18,073 23
Spartan-7
7s100-1
1 Hash Search Engine, 1 Dynamic Huffman Encoder, 32kB History Window, 103MHz 19,190 103
Versal Premium
vp1202-2MP-e-L
10 Systolic Search Engines, 10 Static Huffman Encoders, 256B History Window, 300MHz 47,383 15
Kintex UltraScale
ku085-1-c
10 Hash Search Engines, 2 Dynamic Huffman Encoders, 4kB History Window, 100MHz 144,724 218
Virtex UltraScale+
vu9p-1
36 Hash Search Engines, 6 Dynamic Huffman Encoders, 16kB History Window, 250MHz 578,403 692

The ZipAccel-C can be mapped to any Intel FPGA device (provided sufficient silicon resources are available). The ZipAccel-C performance can scale by instantiating more search engines and/or Huffman encoders. Furthermore, other design options, such as the search area window, affect the silicon resources utilization. The following table provides sample Intel results for a subset of the possible configuration options. They do not represent the smallest possible area requirements nor the highest possible clock frequency. Please contact CAST to get characterization data for your target configuration and technology.

Family Configuration ALMs RAM Bits
Agilex (-3) 1 Systolic Search Engine, 1 Static Huffman Encoder, 512B History Window, 450MHz 7,021 2,040
Arria 10 GX (-3) 1 Systolic Search Engine, 1 Static Huffman Encoder, 2kB History Window, 320MHz 13,641 5,656
Arria 10 GX (-3) 1 Hash Search Engine, 1 Dynamic Huffman Encoder, 8kB History Window, 210MHz 40,668 683,581
Agilex (-3) 1 Hash Engine, 1 Dynamic Huffman Encoder, 32KB History Window, 250MHz 17,091 1,390,662
Arria 10 GX (-3) 4 Hash Search Engines, 1Dynamic Huffman Encoder, 4KB History Window, 110MHz 64,044 1,513,623
Agilex (-3) 26 Systolic Search Engines, 26 Static Huffman Encoders, 2kB History Window, 450MHz 438,617 2,055,946
Agilex (-1) 30 Search Engines, 10 Static Huffman Encoders, 256B History Window, 500MHz 319,724 7,471,974

ZipAccel-C reference designs have been evaluated in a variety of technologies. ZipAccel-C performance can scale by instantiating more search engines and/or Huffman encoders. Furthermore, other design options, such as the search area window, affect the silicon resources utilization.

The core can be mapped on any Lattice FPGA provided sufficient silicon resources are available. The following are sample implementation results for a small subset of the possible configuration options of the core on a CertusPro-NX and an Avant-E device, and do not represent the smallest possible area requirements nor the highest possible clock frequency.

Family /
Device
Configuration Freq.
(MHz)
Logic
Resources
Memory
Resources
CertusNX-Pro
LFCPNX-100(-9)
1 Systolic Search Engine,
1 Static Huffman Encoder,
512B History Window
155 16,694 Slices 15 EBR
1 Systolic Search Engine,
1 Static Huffman Encoder,
2kB History Window
140 35,941 Slices 16 EBR
1 Hash Search Engine,
1 Dynamic Huffman Encoder,
8kB History Window
110 27,943 Slices 65 EBR
1 Hash Search Engine,
1 Dynamic Huffman Encoder,
32kB Window
100 28,022 Slices 99 EBR
Avant-E
LAV-AT-500E(-1)
1 Systolic Search Engine,
1 Static Huffman Encoder,
512B History Window
180 7,734 Slices 8 EBR
1 Systolic Search Engine,
1 Static Huffman Encoder,
2kB History Window
175 23,338 Slices 8 EBR
1 Hash Search Engine,
1 Dynamic Huffman Encoder,
8kB History Window
145 19,271 Slices 48 EBR
1 Hash Search Engine,
1 Dynamic Huffman Encoder,
32kB Window
140 19,253 Slices 50 EBR

Related Content

Features List

Compression Standards 

  • Deflate (RFC-1951)
  • ZLIB (RFC-1950)
  • GZIP (RFC-1952)

Deflate Features

  • LZ77 with configurable block and search window size
  • Static and dynamic Huffman
  • Optional stored deflate blocks 
  • Dynamic mode selection 

Flexible Architecture 

  • Fine-tune Throughput, Compression Efficiency, and Latency to match application requirements
    • More than 400Gbps with one core instance, scalable to meet any throughput requirement 
    • Compression efficiency can be on par with Unix/Linux max compression option (gzip -9)  
    • Silicon requirements start from less than 100k gates
    • Under 40 clock cycles for Static Huffman 
  • Configuration options (partial list):
    • Search engine and Huffman encoder architecture
    • History search window size (up to 32KB)
    • Deflate block size
    • Stored blocks support
    • Parallel processing level

Easy to Use and Integrate

  • Processor-free, standalone operation  
  • AXI-Stream or native FIFO-like data interfaces
  • Large file segmentation enables meeting QoS objectives
  • Microcode-free, scan-ready design
  • Optional ECC memories
  • Optionally integrated with DMA, encryption or other cores from CAST
  • Complete, turn-key Accelerator Designs available on FPGA boards from different vendor

Resources

Applicable Standards
RFC 1952 – GZIP file format
 RFC 1950 – ZLIB Compressed Data Format
• RFC 1951 – DEFLATE Compressed Data Format
Background & More Info
Data Compression in Solid State Storage, presentation at Flash Memory Summit 2013 (PDF)
• Wikipedia entries on GZIP, ZLIB, and Deflate
• An explanation of the Deflate algorithm by Antaeus Feldspar
GZIP Project website
ZLIB Project website

Let's talk about your project and our IP solutions

Request Info