One of the reasons distributed programming technologies like RMI are difficult to get to work in practice is the presence of network security technology like firewalls. A typical firewall will prevent applications from communicating except over specific ports - and commonly, those ports do not include the ones we want to use - and may also check out the traffic that is actually using the port (Application-Level firewalls). One solution to this is tunnelling - using a port that is very likely to be open instead, and `masquerading' as the kind of traffic that would be . Commonly, this will be port 80, and we'll use HTTP. It turns out that very little actual work is involved in doing this with RMI, as it will automatically try to tunnel if connecting directly does not work.
Security is currently a big issue - large organisations should take it seriously, and so should you - even dialup users are at risk. The firewall concept has been developed to protect organisations machines. It serves two purposes.
We can think of a number of typical approaches that might be taken.
The actual policies implemented by firewalls vary. They might just allow or deny access to particular ports (that is, they assume ports are being used for their intended purpose). They might actually monitor the traffic more closely - for example, stateful packet inspection, which checks to see if network packets 'belong' to (legitimate) open connections.
Although a generally good thing, firewalls do limit access and it is more than likely that the ports you propose to use for remotely invoking methods are closed to you - even if they are the default ports used. You could of course try to convince whoever is in charge of security to change the policy to allow access. How difficult this is depends. It may well be you that is in charge, and hence easy - assuming you conclude it is actually a secure thing to do. In a big organisation it may be next to impossible. The alternative is to cheat and disguise your traffic as HTTP traffic (on whatever port it is permitted). This is a process known as Tunnelling, and we will look at how RMI does it in the next chapter. (In fact, RMI more or less tries to do it by default - so little work is actually involved.) However, in this chapter we will look at an XML based solution to the problem - the Simple Object Access Protocol, or SOAP.
By default, RMI uses the Java Remote Message Protocol (JRMP) - though it can also use CORBA's Internet Inter-Orb Protocol (IIOP) as well. JRMP, and IIOP, are likely to have problems with firewalls, and hence we must periodically resort to HTTP tunnelling to pass through them. You may wonder: if this is such a problem, why bother with any protocols other than tunnelling at all? There are a number of reasons.
For these reasons, RMI tries a number of ways of connecting to a server, and it orders these attempts so the most efficient and reliable are tried first. Only if these do not work does it attempt successively more complex tunnelling methods.
RMI has five strategies that it tries to use to connect to a server.
You might be thinking at this point: wouldn't it be easier to get the sysadmin to open the required ports in the firewall? Well, maybe. But often they will be reluctant to do this. They typically live, in a large organization, under a state of perpetual attack from malicioius, or just curious but still potentially dangerous, outsiders. They are generally going to be reluctant to open security holes (from their point of view) that would put at risk the entire organization, to help what might well seem from their point of view to be something unimportant. (As an example, RMI does not work on the University wireless network - do you think it's reasonable to open RMI access to accommodate the, typically, 50 to 100 people on this module, when doing so would potentially affect 12000 others?). Finally, even if they want to, they may not be able to, because what you are asking may conflict with other services.
It may seem so far that we need do nothing to get tunnelling to work. However, that is not necessarily true. In the case that a web server intervenes between our client and server code, we need some mechanism for the method invocation requests to be passed on - that is, the web server must be able to recognize that the `HTTP request' is not really intended for the web server but for the RMI server, and also must have a means of transfering the request. The mechanism used is to package the RMI request as an HTTP POST request for a specific URL:
/cgi-bin/java-rmi.cgi
We must ensure that an appropriate application is located at this URL, and in the usual case that it's a Java servlet, that a servlet engine is available and running.
As the URL implies, we could use a CGI - Common Gateway Interface - program, which is an older technology. All it would have to do is the following.
Early versions of RMI did just this, and a CGI program was provided. However, since the inception of the more efficient and secure Java Servlets, it has been more usual to employ one of them. Again, one is provided by Sun.
To get Java servelets to work, you need to have a servlet engine - which you typically don't get by default, so you have to download and install one. There are a number of servlet options available - the usual choice is probably Apache together with the Jakarta Tomcat servlet engine. (The Apache Jakarta project is an umbrella for all their server-side Java tools - of which Tomcat is just one.) Typically, the servlet engine will run within a separate process to the actual web server called the servlet runner. Actual servlets will generally be run as threads within the servlet runner.
What does a servlet typically consist of? They will extend the pre-existing HttpServlet class, and override the methods within it for handling the various HTTP requests. For example, there are doGet and doPost methods. Each takes two parameters: the first is a reference to a HttpServletRequest object, which contains the HTTP request; the second is a reference to a HttpServletResponse object, in which the response will be constructed. The default servlet implementation for RMI tunnelling overrides the doPost method to decode and check the request; forward it to the appropriate server; and collect the resulting response. Forwarding and collecting the response is simply done by communicating over a socket. (It doesn't need to implement doGet since RMI forwarding doesn't use GET.)
The servlet obviously needs to know which port the RMI server is listening on - so the client has to tell it as part of the URL. The URL the client will actually use will be of the form:
/cgi-bin/java-rmi.cgi?forward
meaning use the default port, or:
/cgi-bin/java-rmi.cgi?forward=portNum
meaning use the specified port (portNum). (If you know anything about HTTP requests, you might noticed that even though we are using a POST request, we can still pass GET-type information as part of the URL).
There are two basic problems with the default servlet. First, there is an assumption that the RMI server object sits on the same machine as the web and servlet servers. You can see this from the URL: /cgi-bin/java-rmi.cgi does not include, and has no provision for, a hostname (or equivalently an IP address). To be fair, this is not a problem with the default servlet but with the underlying RMI tunnelling mechanism. However, you can change the servlet to fix it, so we will not consider it here. (Like the RMI registry, the default servlet is meant as an illustrative example - sophisticated users are expected to replace it.) Secondly, the default servlet passes on the request directly to an RMI server (or tries to). But what if the servlet is actually sitting on a machine in the (De-Militarized Zone) DMZ between two firewalls? Serious operational security often involves two layers of firewalls: the first providing protection for servers that actually need to talk to the outside world (web, mail etc.); the second further protecting internal servers, desktops and so on. In this case, RMI should actually try to tunnel through the second fire wall to another machine instead of attempting to directly contact an RMI server (unless of course the RMI server is also sitting between the two firewalls).
To fix these problems, you need to write your own servlet - or at least override the static execute method in ServletForwardCommand (and perhaps a few others as well).
To fix the first problem, you simply need to get execute to open a connection to a remote host instead of locally. (Though bear in mind that the real code is significantly more complex than the cut-down and simplified example we have shown.)
The second problem is a bit more complicated to solve - at first sight it looks like you would have to duplicate the approach of trying five different ways to communicate with the server. In practice, it is unlikely to be that complicated - if you have a DMZ most of the five are likely to be non-starters. In practice, you may well know precisely which method will work. (However, bear in mind that you code will be more robust in the face of changes to the firewall if you implement others as well). The actual forwarding process just involves opening a socket connection (on the RMI port in use, or port 80 as appropriate) to the appropriate machine (either the RMI server or the firewall/proxy), sending the data down it, and collecting the response.