This is utterly fascinating.
To be clear -- it stores a low-res version in the output file, uses neural networks to predict the full-res version, then encodes the difference between the predicted full-res version and the actual full-res version, and stores that difference as well. (Technically, multiple iterations of this.)
I've been wondering when image and video compression would start utilizing standard neural network "dictionaries" to achieve greater compression, at the (small) cost of requiring a local NN file that encodes all the standard image "elements".
This seems like a great step in that direction.
Interesting. It sounds like the idea is fundamentally like factoring out knowledge of "real image" structure into a neutral net. In a way, this is similar to the perceptual models used to discard data in lossy compression.
This is really interesting but out of my league technically. I understand that super-resolution is the technique of inferring a higher-resolution truth from several lower-resolution captured photos, but I'm not sure how this is used to turn a high-resolution image into a lower-resolution one. Can someone explain this to an educated layman?
I asked a question about a similar idea on Stack Overflow in 2014. https://cs.stackexchange.com/questions/22317/does-there-exis...
They did not have any idea and they were dicks about it as usual.
This technology is super awesome... and it's been available for awhile.
A few years ago, I worked for #bigcorp on a product which, among other things, optimized and productized a super resolution model and made it available to customers.
For anyone looking for it - it should be available in several open source libraries (and closed source #bigcorp packages) as an already trained model which is ready to deploy
On the order of 10% smaller than WebP, substantially slower encode/decode.
Reminds me of this.
Gave me a comical thought if such things can be permitted.
You split into rgb and b/w, turn the pictures into blurred vector graphics. Generate and use an incredibly large spectrum of compression formulas made up of separable approaches that each are sorted in such a way that one can dial into the most movie-like result.
3d models for the top million famous actors and 10 seconds of speech then deepfake to infinite resolution.
Speech to text with plot analysis since most movies are pretty much the same.
Sure, it wont be lossless but replacing a few unknown actors with famous ones and having a few accidental happy endings seems entirely reasonable.
Related for an other domain, lossless text compression using LSTM: https://bellard.org/nncp/
(this is by Fabrice Bellard, one wonder how he can achieve so much)
This is a lot like "waifu2x". That's super-resolution for anime images.
Reminds me of RAISR (https://ai.googleblog.com/2016/11/enhance-raisr-sharp-images...).
I remember talking with the team and they had production apps using it and reducing bandwidth by 30%, while only adding a few hundred kb to the app binary.
and what's the size of the neural network you have to ship for this to work? has anyone done the math on the break even point compared to other compression tools?
e: actually a better metric would be how much does it compress compared to doing the resolution increase with just lanczos in place of the neural net and keeping the Delta part intact
Does anyone know how much better the compression ratio is compared to png? Which is also a lossless encoder.
I wonder how well this technique works when the depth of field is infinite?
Out of focus parts of an image should be pretty darned easy to compress using what is effectively a thumbnail.
That said, the idea of having an image format where 'preview' code barely has to do any work at all is pretty damned cool.
Would massive savings be achieved if an image sharing app like say, Instagram were to adopt it, considering a lot of user-uploaded travel photos of popular destinations look more or less the same?
I believe a big issue with this will be floating point differences. Due to the network being essentially recursive, tiny errors in the initial layers can grow to yield an unrecognizably different result in the final layers.
That's why most compression algorithms use fixed point mathematics.
There are ways to quantizise neutral networks to make them use integer coefficients, but that tends to lose quite a lot of performance.
Still, this is a very promising lead to explore. Thank you for sharing :)
Is this actually lossless - that is, the same pixels as the original are recovered, guaranteed? I'm surprised such guarantees can be made from a neural network.
I though superresolution uses multiple input files to "enhance". For example - extracting a highres image from a video clip
This is interesting but I'm not sure if the economics of it will ever work out. It'll only be practical when the computation costs become lower than storage costs
How do ML based lossy codecs compare to state of the art lossy compression? Intuitively it sounds like something AI will do much better. But this is rather cool.
Looks like FLIF has a slight edge on compression ratio according to the paper, but it beats out other common compression schemes which is impressive.
How does it work for data other than Open Images, if trained on Open Images? If it recognizes fur, it's going to be great on cat videos.
It seems like "lossless" isn't quite right; some of the information (as opposed to just the algo) seems to be in the NN?
Is a soft-link a lossless compression?
It's like the old joke about a pub where they optimise by numbering all the jokes, .. just the joke number isn't enough, it can be used to losslessly recover the joke, but it's using the community storage to hold the data.