[SOLVED] Ensure integrity of transmitted data

Ensure integrity of transmitted data

Suppose a software application (e.g. an IIS website) is to be deployed on customers' computers. For licensing and invoicing purposes, the application periodically sends certain usage statistics (e.g. number of managed entities) back to a central service owned by the software company. What is the best way to ensure the integrity of the transmitted data (i.e. that an attacker with administrative rights on the customer's system cannot send fake data to the central service)?

The obvious way would be to hardcode a private key in the application code to sign the data, and to obfuscate the code. This would ensure integrity as long as the key is not de-obfuscated. But can we do better than security by obscurity?

Solution

You can't ensure the integrity of the data. What you can do is "best effort" integrity.

Given this extra info from the comments:

The threat model is a cheating customer who can inspect but not modify the code, and who controls the communication channel to the central service

We can look into this.

Best Effort

The basic threat is a replay attack: the attacker can record the communication between the app and the central service with e.g. Wireshark and then re-send it.
TLS protects you from that.

But the implementation of TLS usually uses native OS implementation (e.g. openssl), and if the attacker can control the native OS implementation, they can easily produce a setup where TLS traffic can be decrypted and replayed.
Adding a request ID and a timestamp can mitigate this by having the server refuse to accept anything that isn't e.g. from the last 24h or with a request ID that has already been seen in the last 24h.

The next threat is then the attacker sending a modified version with a new request ID and timestamp, and probably other modified fields that make their cheating worthwhile. Your next line of defense is then to obfuscate the payload.

Depending on how deep the rabbit hole is, the attacker can still reverse engineer it using tools like ltrace / gdb / jdb / visual studio, or by studying the bytecode / assembly code / CIL code.
The next thing to do, then, is to encrypt and/or sign the requests in the application using keys embedded in the application.

But if the attacker owns the machine and has access to the application's files, it's not hard to find the keys.
You could, perhaps, take the keys apart to many little unrecognizable bits and hide them in different places in the app, and have some code to re-assemble the keys.

But if the attacker sees that it's signed/encrypted, they're going to look for keys, and when they don't find them, they'll try to trace back from the network call to the creation of the report and see how the application gets the keys.

Perhaps a better option is to set up another service that hands out keys: the application would request a key and use that to encrypt the message for the central service.

But of course the attacker can call this service too.

Another option: these days there's hardware that is designed against reverse engineering, and perhaps your app could refuse to start up without that hardware being available - depending on who your customers are.

I hope you see how this best effort makes your billing infrastructure a nightmare for you more so than for the attacker.

The Common Solution

Usually what companies do in this case is to put the usage report and provisions against reverse engineering in the terms of service, try to detect anomalies in the reports, and when such are detected, investigate and pursue legal action.

Another question you need to ask yourself at the architectural level: What is the likelihood of a customer cheating, and how much is it going to cost me in total, before I detect it and kick the customer out?
Namely: is this problem worth solving?