We describe techniques for implementing cryptographic algorithms in software for resource-constrained ARM devices. The target platforms are the Cortex-M and Cortex-A family of processors typical of embedded systems, located towards the mid to lower-end of the ARM spectrum of architectures. The implementations include the Fantomas and PRESENT lightweight block ciphers and curve-based primitives for key exchange and digital signatures. We improve on the state-of-the-art implementations of these algorithms substantially, both in terms of efficiency or compactness, by making use of novel algorithmic techniques and features specific of the target platforms.