In almost any nook and cranny of government, you will find similar handbooks, guidebooks, rulebooks and other books that are crying out for user-friendly search layer. Elasticsearch is an excellent choice for such a project, but it often needs to be integrated with the government's (usually Windows-based) servers. Kudos to 18F for introducing Linux in many of their projects, but for now, those projects are the exception. For the rest of us, using Elasticsearch on Windows presents a number of challenges. All of these can be overcome, though documentation for these solutions is scarce.
In my last post, I described how data from a MS SQL Server can be imported to and indexed in Elasticsearch. In future posts, I will discuss some indexing and analysis tricks that can make the search experience smoother (like how to avoid retrieving duplicate results). This post is dedicated to security of the Elasticsearch cluster on Windows.
Elasticsearch lacks built-in security
Out of the box, Elasticsearch:- Uses a separate port (default = :9200) for requests and responses. Searching directly from the browser would require opening the port to web traffic on the server.
- Allows data to be changed and deleted, in addition to being retrieved. This is great for a developer who can quickly create, delete and update a search index. You don't usually want your users to have these superpowers, though.
- Provides data about the entire search cluster and all indexes on the cluster. That's like having a SQL server open to the world. While this may please Bobby Tables, it makes the rest of us uncomfortable.
Another option is to use features of the IIS webserver itself. For this, you will have to build a search application interface, but for any practical search project you will want to do that anyway. I have stripped down one such GUI to focus on text-base document searches. This starter application (on Github) takes user input, formulates searches and returns a paginated list of results. Do let me know if you use it: I have discovered many useful improvements to satisfy specific client needs.
Using a Reverse Proxy on IIS
With IIS serving your main application, you can create a reverse proxy on IIS for Elasticsearch requests. The reverse proxy will translate and reroute url web requests into internal requests to Elasticsearch. How does this work? Your application requests to https://www.mydomain.com/search are routed by IIS internally to https:localhost:9200/_search. The internal address is not accessible to outside web traffic.Advantages:
- No need to expose another port. All traffic can be routed through a url on your default web port (:80).
- Block delete and change requests-- simply don't set up any urls that will route these requests to Elasticsearch.
- Use IIS security features as needed for your main application (e.g. Anonymous login or Windows login).
Install Web Platform Installer
Pretty straightforward, from a Microsoft downloads page.
Install ARR and URLRewrite modules
Open the Web Platform Installer interface, search for "ARR" and "URLRewrite" respectively, and follow instructions to install each of them.Create the URLRewrite rules for your website
- Open the IIS Server Manager and navigate to your website in the Connections column on the left side. Double click on the Application Request Routing Cache icon (ARR, under IIS, in the Features View).
- Open Server Proxy Settings on the right side of ARR and make sure the checkbox is selected for Enable Proxy. Close the ARR menu. Double click on the URLRewrite icon. Click Add Rule -> Blank Rule.
- Write the rewrite rules for Elasticsearch. The main rewrite rule will use Regular Expressions to match the a pattern like: search/_search(.*), and rewrite it as https://localhost:9200/_search{R:1}
- You may want to expose other Elasticsearch API's as well, and it is best to create a rewrite URL for each of them. For example, to check the health of the cluster, match search/_cluster/health to https://localhost:9200/_cluster/health. If you are having trouble writing using the IIS Manager UI to enter these rules, consult the blog I referred to earlier, or directly add the rules to your web.config file.
- When you are done entering the rewrite rules, you will have an XML file in your website folder called web.config that will include a <rewrite> section. It should look something like this file: https://gist.github.com/aih/8f2b8d76b44d8836bd77
Test it out
From your application, you should now be able to submit a query (either as query parameters in the url, or with a json query payload) to https://[www.yoursite.com]/search, and get the response from Elasticsearch. Note that now the path at /search is reserved for Elasticsearch traffic, so your application cannot use that path (e.g. for a webpage). If this is a problem, you can use any path you prefer for your reverse proxy settings.
Your thoughts?
The Elastic team described setting up a reverse proxy with Nginx for many of the same reasons, and this does seem like a clean way to expose only the search API to external web traffic. Are there other architectures you have used for Elasticsearch on Windows? Do you see security vulnerabilities with the approach I've descriped? I'd like to hear your thoughts in the comments.