2 x 5060 ti: Any better configs for Qwen 3.6 27B / 35B?

I have been trying various setups, quants etc for Qwen 3.6 27B and 35 A3B on my 2 x 5060 TI 16 GB setup. I am wondering if others with similar setups are seeing similar numbers, or if there is to tweak? So far all attempts at speculative decoding has failed with very poor performance, supposedly due to PCI-E bandwidth limits.