I have a number of files that will live on a server. Users have the ability to create these kinds of files (plists) on-device which will then upload to said server (CloudKit). I would like to unique them by content (the uniquing methodology should be resilient to variations in creation date). My understanding is that I should hash these files in order to obtain unique file names for them. My questions are:
Thanks so much!
Create a cryptographic hash of each file and you can use that for uniqueness comparisons. SHA-256 is a current hash function and on iOS with Common Crypto is quite fast, on an iPhone 6S SHA256 will process about 1GB/second minus the I/O time. If you need fewer bytes just truncate the hash.
An example using Common Crypto (Swift3)
For hashing a string:
func sha256(string: String) -> Data {
let messageData = string.data(using:String.Encoding.utf8)!
var digestData = Data(count: Int(CC_SHA256_DIGEST_LENGTH))
_ = digestData.withUnsafeMutableBytes {digestBytes in
messageData.withUnsafeBytes {messageBytes in
CC_SHA256(messageBytes, CC_LONG(messageData.count), digestBytes)
}
}
return digestData
}
let testString = "testString"
let testHash = sha256(string:testString)
print("testHash: \(testHash.map { String(format: "%02hhx", $0) }.joined())")
let testHashBase64 = testHash.base64EncodedString()
print("testHashBase64: \(testHashBase64)")
Output:
testHash: 4acf0b39d9c4766709a3689f553ac01ab550545ffa4544dfc0b2cea82fba02a3
testHashBase64: Ss8LOdnEdmcJo2ifVTrAGrVQVF/6RUTfwLLOqC+6AqM=
Note: Add to your Bridging Header:
#import <CommonCrypto/CommonCrypto.h>
For hashing data:
func sha256(data: Data) -> Data {
var digestData = Data(count: Int(CC_SHA256_DIGEST_LENGTH))
_ = digestData.withUnsafeMutableBytes {digestBytes in
data.withUnsafeBytes {messageBytes in
CC_SHA256(messageBytes, CC_LONG(data.count), digestBytes)
}
}
return digestData
}
let testData: Data = "testString".data(using: .utf8)!
print("testData: \(testData.map { String(format: "%02hhx", $0) }.joined())")
let testHash = sha256(data:testData)
print("testHash: \(testHash.map { String(format: "%02hhx", $0) }.joined())")
Output:
testData: 74657374537472696e67
testHash: 4acf0b39d9c4766709a3689f553ac01ab550545ffa4544dfc0b2cea82fba02a3
Also see Martin's link.