Scan file uploads for Malware in EPiServer/Optimizely CMS 11

Do you have forms on your website where visitors can upload files? Perhaps CV's for job applications or documentation for claims, or other kind of applications or images? And have you thought about the risk of these files potentially containing malware right on your production webserver? A client of mine has this concern on an EPiServer (now Optimizely) CMS 11 using the EPiServer Forms extension and I investigated and found an approach to handle it.

NOTE: This solution applies to CMS 11 - even though it's fairly outdated there are still a lot of them around. It runs on .NET framework - and hence on a windows machine. At the end of this blog post I'll discuss how this could be approached for CMS 12.

Basically it comes down to this: What if a visitor to your website uploads a malicious file (with or without intent)? Now, of course this might not be a big issue - the files are stored in a secure place, and are most likely not executed. And whatever antivirus you have will probably get to them in time. But what it if doesn't? And perhaps the form is configured to send everything along right away to an email where an employees opens the file?

You could definitely argue that the best thing would be if infected bytes never were stored on the server - but blocked before they even reach it.
However, that might be easier said than done. In ASP.NET (.NET Framework) the moment you reach your code in your Form Actors or validators or whatever else you could think of, the posted data has already been parsed and files have been saved to a temporary location.

To reach it even before that happens you have to create an HTTPModule and attach to the BeginRequest where you can access the incoming InputStream. But that's just multipart form data - not parsed into key-value-files yet.
However, luckily there is of course a nuget package for that - the Http-Multipart-Data-Parser. It can parse the input stream and identify the files uploaded.

Next issue: Now we have a file stream - but how do we identify if it's malware? Again, we're in luck. Since this is running on .NET Framework and classic Episerver we can assume it's running on Windows which means we have access to Windows Defender. However - it's only accessible through classic interop. Now, we could wrap the interop into a class ourselfes - in fact, there is a great blog post here describing how to do it, but we could also just use the nuget package so kindly provided. I use version 1.0.8 since it works with .NET Framework.

Putting it all together it looks like this:

public class UploadScanModule : IHttpModule
{
    public void Init(HttpApplication context)
    {
        context.BeginRequest += (sender, args) =>
        {
            var request = HttpContext.Current.Request;

            if (request.HttpMethod == "POST" && request.Path.Equals("/EPiServer.Forms/DataSubmit/Submit", System.StringComparison.InvariantCultureIgnoreCase) && request.ContentType.StartsWith("multipart/form-data"))
            {

                using (var ms = new MemoryStream())
                {
                    request.InputStream.CopyTo(ms);
                    ms.Position = 0;
                    //Parse the multipart form data
                    var parser = MultipartFormDataParser.Parse(ms);
                    foreach (var f in parser.Files)
                    {
                        using (var fileStream = new MemoryStream())
                        {
                            f.Data.CopyTo(fileStream);
                            byte[] body = fileStream.ToArray();
                            using (var application = AmsiContext.Create("EPiServer Site"))
                            using (var session = application.CreateSession())
                            {
                                var malware = session.IsMalware(body, f.Name);
                                if (malware)
                                {

                                    request.InputStream.Close();
                                    // Notify the user
                                    HttpContext.Current.Response.StatusCode = 400; // Bad Request
                                    HttpContext.Current.Response.StatusDescription = "Malware detected in uploaded file.";
                                    HttpContext.Current.Response.End();
                                    return;
                                }
                            }
                        }
                    }

                    request.InputStream.Position = 0;
                }
            }
        };
    }

    public void Dispose() { }
}

So, I read the inputstream and parse it, go through every file and scan it. If anything has malware I close the input stream and return an error. Otherwise I reset the input stream so ASP.NET can handle it the way it usually does.
Then I register this HttpModule in web.config along with the other modules and I'm ready to testing it.

But - how do I test it the best way? Do I download a virus intentionally? No, you are in luck. Somebody also thought of that. There is a anti-malware test file standard called EICAR - and you can easily download harmless eicar sample files that will be flagged as severe malware.
The tricky bit I found, was to make sure my local windows defender did not quarantine it automatically the moment it discovered I had downloaded it :-) This is the final result after trying to submit an eicar file.

CMS 12, .NET Core and alternative approaches

I haven't done a CMS 12 implementation yet - perhaps that would be a good topic for another blog post. However of the top of my head it could probably be achieved in a similar fasion but with a custom middleware implementation instead of the HttpModule. In .net Core we could use the MultipartReader to parse the incoming stream.

In .NET Core however, we can not assume we are running on windows - and that we have access to Windows Defender. So it's possible that we might need to look around for another alternative. For example ClamAV that's freely available and can run in a docker container. There is a good .NET Core example available for that here.

There are also some other good alternative approaches:

Microsoft Azure offers Defender for Cloud which can scan whatever your upload to a blob storage. However it might be tricky to catch it the response and let the uploading user know of the issue right away.
Malware scanning at the edge (CDN). For example, Cloudflare, that all Optimizely DXP customers have as a CDN, can be configured to automatically scan uploaded files for malware and make sure they never reach the web server. This is by far the most tempting option in my eyes, but I'm not sure if it's configured for DXP customers - and if there is any extra cost associated with it - and again, also not sure if and how end-users are informed that they are uploading malware (which I think is important).
In a similar way I was playing with the idea of setting up a completely separate server/app-service/azure function to receive all Episerver Form postings and proxy them on to the server after scanning them. So no infected bytes would ever reach it. But that comes with it's own sets of challenges.

Scan file uploads for Malware in EPiServer/Optimizely CMS 11 - EPiServer Forms

Want to get more out of Optimizely CMS (Episerver) Forms?

CMS 12, .NET Core and alternative approaches