GPU Optimization for Cost-Effective AI Inference: An Agency's Guide to Cutting Cloud Bills