We design a state-of-the-art software implementation of field and elliptic curve arithmetic in standard Koblitz curves at the 128-bit security level. Field arithmetic is carefully crafted by using the best formulae and implementation strategies available, and the increasingly common native support to binary field arithmetic in modern desktop computing platforms. The i-th power of the Frobenius automorphism on Koblitz curves is exploited to obtain new and faster interleaved versions of the well-known $τ$NAF scalar multiplication algorithm. The usage of the $τ^łfloor m/3 f̊loor$ and $τ^łfloor m/4 ł̊oor$ maps are employed to create analogues of the 3-and 4-dimensional GLV decompositions and in general, the $łfloor m/s o̊̊or$-th power of the Frobenius automorphism is applied as an analogue of an $s$-dimensional GLV decomposition. The effectiveness of these techniques is illustrated by timing the scalar multiplication operation for fixed, random and multiple points. To our knowledge, our library was the first to compute a random point scalar multiplication in less than 10^5 clock cycles among all curves with or without endomorphisms defined over binary or prime fields. The results of our optimized implementation suggest a trade-off between speed, compliance with the published standards and side-channel protection. Finally, we estimate the performance of curve-based cryptographic protocols instantiated using the proposed techniques and compare our results to related work.