Break things, write reports

Generating fast hashes

Need to generate fast hashes? Using MD5? Check out xxHash and feel the speed

Tags — | xxHash | Categories: — dev |
Posted at — Oct 13, 2020

If you’re writing code, and need to generate deterministic identifiers for data - such as if you’re scraping a website, and your code looks like this:

>>> import hashlib
>>> # import requests, scrape some page, pull title/content, etc
>>> m = hashlib.sha256()
>>> m.update(b"Some website title")
>>> m.update(b"Some shocking content, that might be on a webpage \
... that you may, or may not want to hash, such that you could \
... determine if the website content sneakily changes on you \
... such as, a news website that may silently just change details \
... subtly on you :)")
>>> m.hexdigest()
'0babd903a7d76d63c0f9fcbb76b35affd5eddfd9828d25b59d3a61491ae9571f'
>>> # Go check backend, insert if new, etc

Then you’re burning CPU cycles a lot, for properties you probably didn’t want: i.e. taking forever to defeat someone just bruteforcing a password dump. Instead, consider using something like xxHash (https://github.com/Cyan4973/xxHash), which is going to be significantly quicker, and save a lot of resources.

According to the published benchmarks, MD5 was performing at 0.6GB/s, but xxHash was performing at 31.5GB/s on an 2018 edition i7 - a significant improvement.

Food for thought!