Phi-3-mini-4k Performance

GPU Runs Q_2? Runs Q_4? Runs Q_6? Runs Q_8? Runs Unquantized?
Apple M1 (8 GB)
Apple M1 (16 GB)
Apple M1 Max (32 GB)
Apple M1 Max (64 GB)
Apple M1 Pro (16 GB)
Apple M1 Pro (32 GB)
Apple M1 Ultra (64 GB)
Apple M1 Ultra (128 GB)
Apple M2 (8 GB)
Apple M2 (16 GB)
Apple M2 (24 GB)
Apple M2 Max (96 GB)
Apple M2 Pro (16 GB)
Apple M2 Pro (32 GB)
Apple M2 Ultra (192 GB)
Apple M3 (8 GB)
Apple M3 (16 GB)
Apple M3 Max (36 GB)
Apple M3 Max (48 GB)
Apple M3 Pro (18 GB)
Apple M3 Pro (36 GB)
NVIDIA A10 (24 GB)
NVIDIA A100 (40 GB)
NVIDIA A100 (80 GB)
NVIDIA A16 (64 GB)
NVIDIA A2 (16 GB)
NVIDIA A30 (24 GB)
NVIDIA A40 (48 GB)
NVIDIA GTX 1050 (2 GB)
NVIDIA GTX 1050 (4 GB)
NVIDIA GTX 1060 (3 GB)
NVIDIA GTX 1060 (6 GB)
NVIDIA GTX 1070 (8 GB)
NVIDIA GTX 1080 (8 GB)
NVIDIA GTX 1080 (11 GB)
NVIDIA H100 (80 GB)
NVIDIA L4 (24 GB)
NVIDIA L40 (48 GB)
NVIDIA M10 (32 GB)
NVIDIA M4 (4 GB)
NVIDIA M40 (12 GB)
NVIDIA M40 (24 GB)
NVIDIA M6 (8 GB)
NVIDIA M60 (16 GB)
NVIDIA P100 (12 GB)
NVIDIA P100 (16 GB)
NVIDIA P4 (8 GB)
NVIDIA P40 (24 GB)
NVIDIA P6 (16 GB)
NVIDIA RTX 2060 (6 GB)
NVIDIA RTX 2060 (12 GB)
NVIDIA RTX 2070 (8 GB)
NVIDIA RTX 2080 (8 GB)
NVIDIA RTX 2080 Ti (11 GB)
NVIDIA RTX 3050 (6 GB)
NVIDIA RTX 3050 (8 GB)
NVIDIA RTX 3060 (8 GB)
NVIDIA RTX 3060 (12 GB)
NVIDIA RTX 3060 Ti (8 GB)
NVIDIA RTX 3070 (8 GB)
NVIDIA RTX 3070 Ti (8 GB)
NVIDIA RTX 3080 (10 GB)
NVIDIA RTX 3080 (12 GB)
NVIDIA RTX 3080 Ti (12 GB)
NVIDIA RTX 3090 (24 GB)
NVIDIA RTX 3090 Ti (24 GB)
NVIDIA RTX 4060 (8 GB)
NVIDIA RTX 4060 Ti (8 GB)
NVIDIA RTX 4060 Ti (16 GB)
NVIDIA RTX 4070 (12 GB)
NVIDIA RTX 4070 Ti (12 GB)
NVIDIA RTX 4070 Ti SUPER (16 GB)
NVIDIA RTX 4080 (12 GB)
NVIDIA RTX 4080 SUPER (16 GB)
NVIDIA RTX 4090 (24 GB)
NVIDIA V100 (16 GB)
NVIDIA V100 (32 GB)