I am working on a content type spoof detector for a web application. My issue can be answered by any developer with experience on this subject.
My input is a object, which expose its filename, content_type, and io. The object content_type is determined by a lib called Marcel
, the content_type is based on a reducing of the most specific guessed mime_type using the io, filename, and the file extension.
The issue is that, using the Marcel lib this way, the content_type can be spoofed (that's why I am building this detector). Using a spoofed jpg
with a text/plain
content, but a image/jpg
content_type and a .jpg
extension will return image/jpg
.
To solve this, I am analyzing the object io with the linux file
command to determine the 'real' content_type. But there is an issue doing things this way. The file
command will sometimes return a content_type that will not be precise enough or can be an alias for the object provided content_type.
For example, for .wmv
files, Marcel
, using the io + filename + extension will be able to determine a video/x-ms-wmv
content_type. Whereas, the file command will return a video/x-ms-asf
content_type. Which corresponds to a kind of parent of video/x-ms-wmv
. Second example, for .avi
files Marcel
will return video/vnd.avi
wherase the file
command will return video/x-msvideo
, which is an alias for this content_type.
In both cases, these content_types are not equal, but both could be deemed as 'valid' pairs.
The thing is, doing things with way, I need a kind of mapping of these pair values. The thing I am asking SO, is : is building this content_type mapping an already done task? if not, does anyone know if it's a complex task? I guess so since they are 1000s of content_types nowadays...
Depending of your answer I might switch to a less precise method by only performing a detection based on the type (ie image
/video
/application
/...) rather than the whole mime type. This might be enough, validating that the client sends .jpg
, having .png
will not be such an issue, whereas this detector will prevent .exe
files since their type is application
and not image
.
If someone has any experience on this kind of subject, let me know,
marcel
includes many of these type mappings for instance
ext = 'wmv'
types_by_extension = Marcel::TYPE_EXTS.filter_map {|k,v| k if v.include?('wmv') }
#=> ["video/x-ms-wmv"]
types_by_extension.concat(
*types_by_extension.filter_map do |type|
Marcel::TYPE_PARENTS[type]
end
)
#=> ["video/x-ms-wmv", "video/x-ms-asf"]
If you have more you'd like to add you can use the interface provided by Marcel::MimeType
.
The signature for Marcel::MimeType#extend
is:
extend(type, extensions: [], parents: [], magic: nil)
So for instance when I run the above with 'avi'
I only receive
["video/x-msvideo"]
so in order to add 'video/vnd.avi'
a simple extension only option would be to add:
Marcel::MimeType.extend "video/vnd.avi", extensions: %(avi)
Or possibly even using the "magic" parameter:
Marcel::MimeType.extend "video/vnd.avi", extensions: %(avi), magic:[[0, "RIFF", [[8, "AVILIST"]]]]
Out of the box MimeType definitions can be found Here, Here and Here