Plumerai provides a customizable AI workload accelerator for customers that require the most energy-efficient solution. Implemented in low-power FPGAs it yields a 10x performance improvement compared to ARM Cortex-M4 based microcontrollers. The IP core is independent of the model, no FPGA reprogramming is needed when the model is updated. Our architecture can use both internal RAM as well as external memories, and supports small and large models. The IP core is offered in several configurations - the smallest using around 11,000 LUTs - and easily fits into small footprint, low-power FPGA devices.