I have a requirement to make sure that a certain data file being shipped along with the software is not tampered. So, I thought I could encrypt it. So, the plan was to use an asymmetric encryption using private and public keys. Java has support for RSA/ECB/PKCS1PADDING and so I thought of using it. There is also a CipherOutputStream and CipherInputStream in the javax.crypto package which I thought of using and that’s when the problems started. After creating a cipher output stream with a cipher based on RSA/ECB/PKCS1PADDING and writing data to it and closing the stream, the output file had nothing in it. Did some research on what CipherInputStream does and in process learnt that ciphers are typically block ciphers or stream ciphers. Block ciphers operate on a fixed block while stream ciphers can work on a large stream of data. However, it’s possible to convert a block cipher into a stream cipher by using ECB and other modes. So, in theory, it should be possible to use the RSA cipher that can only encrypt a fixed size of bytes as a stream cipher with the ECB mode. So, I manually tried splitting the input stream into small blocks and implemented the stream encrypting without using the CipherOutputStream. For this, I first harcoded a block size of 128 for output but that gave an error (it turns out the size is 128 – 11 = 117. 128 is based on the keysize and 11 is based on the padding). So, after changing the encryption block size to 117, I could successfully encrypt the entire file. I didn’t like the fact that I hardcoded the values. So, looking at the Cipher api, I decided to use the getBlockSize and that’s when I realized it returns a value of 0 (a reason why the CipherOutputStream didn’t work). Hmm, how can this return a value of zero? The documentation for this function says that for Ciphers that are not block ciphers, this value would be 0. What? Well, it turns out, while it’s possible to split the input stream into small chunks and encrypt them using RSA, the typical usage is to encrypt just a single small chunk. Seems only symmetric keys are used for encoding streams of data. So, it turns out that typically a session symmetric key is generated to encrypt the data and this key itself is encrypted using the public-private key encryption. Since the session key is small, it fits within the block size of an RSA cipher (Yes, RSA cipher does have a block size, though it’s not intended to be used as a block cipher).
So, finally I changed my strategy. My requirement is more to do with preventing the tampering of the data rather than preventing viewing of the data. So, I used SHA-5 message digest, computed the digest for the data file, then used the RSA Cipher and encrypted the digest with the private key. The idea is to ship the data file and the encrypted message digest and the public key and then at run time, first compute the digest on the data file and compare it with the decrypted digest computed using the public key on the encrypted digest. If they match, then the file is not tampered, otherwise it is.
BTW, in case you are wondering, why not go with the approach of encrypting with a symmetric key and encrypt the symmetric key using the public/private keys, this has a security issue. Unlike a 2-trusted parties communicating with each other using this approach and trying to prevent a 3rd party from knowing the message, here the issue is that me the 1st party, can’t trust the 2nd party. So, for example, once the program is run, it would be possible to identify the symmetric key used to encrypt the data by inspecting the RAM and then use that symmetric key and encrypt a different piece of data and overwrite the old encrypted data file. The program would happily accept the tampered data because it was able to successfully decrypt the data file using the same symmetric key that doesn’t change (once you ship the software, the key remains the same). Note that the purpose of encrypting the message digest above is not to hide it because again, looking at the RAM, it would be possible to figure this out, but the idea is that one can’t compute the encrypted value of the message digest of the tampered file since it can only be done by me using the private key.
Security is an interesting area. There are a handful of tools and which tool to use when depends on the use case. The data file in the above use case can be a software license as well that contains the details of the party that licensed the software. One doesn’t care so much about the fact that the license details are visible but that those visible details are not tampered.