We revisit and improve performance of arithmetic in the binary GLS254 curve by introducing the 2DT-GLS scalar multiplication algorithm. The algorithm includes theoretical and practice-oriented contributions of potential independent interest: (i) for the first time, a proof that the GLS scalar multiplication algorithm does not incur exceptions, such that faster incomplete formulas can be used; (ii) faster dedicated atomic formulas that alleviate the cost of precomputation; (iii) a table compression technique that reduces the storage needed for precomputed points; (iv) a refined constant-time scalar decomposition algorithm that is more robust to rounding. We also present the first GLS254 implementation for Armv8. With our contributions, we set new speed records for constant-time scalar multiplication by and on 64-bit Arm and Intel platforms, respectively.