Chapter 13 Network Communication
Every programming tool has a dirty little secret that makes using it less pleasant than the walk in the park promised by the marketing brochures. Basic has its spaghetti code and C has its wild pointers. Java is no different. The dirty little secret of Java applet writing is that browser security mechanisms make it nearly impossible to write a useful Java applet that doesn’t connect back via sockets to a daemon on the server. Specifically, browsers can, and do, prevent applets from doing any file I/O. This means that any persistence that an applet requires has to be implemented in a server application (not an applet) to which the applet talks over the Net.
Java provides network communication through the package java.net. This package contains a number of useful classes. The basic ones that we’re going to use and talk about are URL, Socket, ServerSocket, and InetAddress. The class hierarchy for the java.net package is shown in Figure 7.1.
Another package that is inextricably bound up with network I/O and I/O in general is the aptly named java.io package. As the various examples will show, whenever you do any kind of I/O, you will have to use one of the Stream classes provided in the java.io package. Figure 7.2 shows the java.io class hierarchy.
Using URLs
A URL, or Uniform Resource Locator, is basically just a network location. It tells you not only where something is, but what it is. For example, consider my home page URL:
http://www.channel1.com/users/ajrodley/index.html
Briefly, the URL says that to use this “document,” you connect to the http server on the machine named www.channel1.com, then tell it to send a stream of data made out of the file “/users/ajrodley/index.html.”
Java’s URL class takes this concept of an Internet location one step beyond. After all, a URL by itself can be easily represented by a String. A Java URL object, on the other hand, encompasses not just the address, but also the object at that address. Let’s investigate URLs in more detail.
Word-Searching a URL
Many organizations are sitting on huge piles of information that they want to make available via the Web. After all, it’s a perfect application of the technology, freeing information consumers from the time-consuming task of hauling themselves physically to the location (library, town hall ...) where the hard-copy information resides. For most companies in this situation, the easiest approach is to just dump the information out on the Web virtually unmodified from the hard-copy version. Although this is a valid approach, Java makes it easy to go one step beyond.
As a first step in adding value to a huge, raw document, you might want to add a word-search applet like the one shown in Listing 7.1, which scans a document for a particular word. Listing 7.2 shows the HTML code to load this applet.
Listing 7.1 Searching a Document for a Word
package chap7;
import java.awt.Graphics;
import java.awt.*;
import java.applet.Applet;
import java.net.*;
import java.lang.*;
import java.io.*;
import java.util.*;
/** A class to search for a word in a text document using only
URL.getContent to retrieve the document into a String.
@author John Rodley
@version 1.0 1/1/1996
*/
public class ch7_fig1 extends Applet {
String stringURL;
String desiredWord;
public final static String buttonLabel =
new String("Start Search");
TextField wordField;
Panel controlPanel;
Button searchButton;
/** Get the parameter "URL" from the applet tag, create a
panel with a text field for entering the word to search for,
and a button for starting the search.
*/
public void init() {
stringURL = new String( getParameter( "URL" ));
if( stringURL == null ) {
System.out.println( "URL parameter not set" );
return;
}
controlPanel = new Panel();
searchButton = new Button( buttonLabel );
controlPanel.add( searchButton );
wordField = new TextField( 25 );
controlPanel.add( wordField);
add( controlPanel );
}
/** Do the search, using the URL provided in the URL parameter
and the search phrase set previously in the text field. Skip
the search if either of these variables is not set.
*/
void dosearch() {
if( desiredWord == null ) {
System.out.println( "find parameter not set" );
return;
}
try {
URL u = new URL(stringURL);
try {
Object o = u.getContent();
if( o instanceof String ) {
FindTheWord( desiredWord, (String)o );
}
}
catch( IOException e ) {System.out.println( "ioex "+e);}
}
catch( MalformedURLException ue ) {System.out.println( "urlex "+ue);}
}
/** Search the supplied string for the specified sub-string.
Find by character position in the doc, as well as line
position. Report results to standard output.
@param String the phrase to search for
@param String a string that contains the ENTIRE document
*/
void FindTheWord( String find, String doc ) {
int ret = doc.indexOf( find );
if( ret != -1 ) {
showStatus( "Found "+find+" at char offset "+ret );
System.out.println( "found "+find+" at char offset "+ret );
}
// Split the doc into lines using a tokenizer with only \r
// and \n as the delimiters.
StringTokenizer lines = new StringTokenizer( doc, "\r\n" );
int lineNo = 1;
while( lines.hasMoreElements()) {
String line = (String)lines.nextElement();
// Create a second tokenizer, with space as the delimiter
StringTokenizer words = new StringTokenizer( line );
while( words.hasMoreElements()) {
String word = (String)words.nextElement();
if( word.toUpperCase().compareTo( find.toUpperCase())==0)
{
showStatus("found "+find+" on line "+lineNo );
System.out.println( "found "+find+" on line "+ lineNo);
}
}
lineNo++;
}
}
/** Handle the search button, starting a new search
if the start button is pushed.
*/
public boolean handleEvent(Event e) {
if (e.id == Event.ACTION_EVENT) {
if( e.target instanceof Button )
{
Button b = (Button )e.target;
if( b.getLabel().compareTo( buttonLabel ) == 0 )
{
desiredWord = new String( wordField.getText());
dosearch();
}
}
}
return( false );
}
}
Listing 7.2 The HTML Code That Loads the Word Finder Applet
<!DOCTYPE HTML PUBLIC "-//SQ//DTD HTML 2.0 HoTMetaL + extensions//EN">
<HTML><HEAD><TITLE>Chapter 7 - figure 1 - Finding a word in a
simple network text file.</TITLE>
</HEAD>
<BODY>
<applet code="chap7/ch7_fig1.class" width=600 height=600>
<param name=URL value="http://www.mymachine.com/temp/index.txt">
</applet>
</P>
</BODY></HTML>
The <applet> tag on the HTML page provides both the URL of a document to scan while a TextField gets the word to search for, and a Button starts the search.
One thing you need to note from Listing 7.2: the document we’re scanning isn’t an HTML document. We’ll talk about that later.
Setting up the search requires getting the URL of the document we’re scanning, and getting a word to scan for. In the init method, we get the URL via the getParameter method. We also setup a text field for entering the search word, and a button for starting the search. The text field and button arrangement require us to implement a handleEvent method. As you can see, the event handler deals with a single event (among the many that can occur)—the user pressing the button labeled “Start Search.”
Once we have a word to search for, and a string URL of a document to search, the next logical step is to retrieve the document. This is the purpose of the first part of the dosearch method, specifically the calls to the URL constructor and URL.getContent. The URL constructor actually connects to the document specified by our string URL. If the document doesn’t exist, or we can’t make a connection to it, the URL constructor throws a MalformedURLException, which we are required to catch. Having connected to the document, we then retrieve it via the call to getContext, where most of the functionality of this applet is actually embedded. When the file we’re pointing at is simple text, what we get back from the call to getContent is a String that holds the entire text of that file.
Having read our whole file into a String, we simply pass that String into the method FindTheWord that searches for a word occurrence. Within FindTheWord, we use two different ways to find our word within the String. The first way is an exact match via String.indexOf:
int ret = doc.indexOf(find);
if( ret != -1 ) {
showStatus( ... );
This approach searches the whole String, whitespace included, for the word, returning the character offset of the word within the string. Unfortunately, knowing the character offset of a word within a file is rarely useful.
What’s usually called for in word searches is a line number within the document. Thus, within FindTheWord, we also implement a second, more useful search algorithm. This one uses StringTokenizer to turn the String representing our whole file into a Vector of lines. Then it turns each line into a Vector of words and compares each member of that Vector to the desired word.
The two search algorithms, indexOf and compareTo will often find different things. indexOf will find the word within another word, while compareTo won’t. compareTo combined with toUpperCase matches the words regardless of case.
The final thing to notice about our word-search applet is that we haven’t talked at all about sockets, protocols, or daemons. All of the grunt work of connecting over the network and retrieving the file is handled entirely within URL.getContent.
Expanding the Search Beyond Simple Text
If you point the word search applet at a file with an .htm, or
.html extension, something very disturbing happens: getContent throws a ClassNotFound
Exception. This is because there is no content handler for content of type
HTML. The steps URL.getContent goes
through to deliver our String in
Listing 7.1 are:
1. Make a connection over the Net via sockets.
2. Create a stream of characters flowing from the server to the applet.
3. Figure out that the
stream is simple text and turn that stream of characters
into a String object.
The problem with using this applet on HTML content is that Java figures out that the stream is HTML (not simple text), but it doesn’t know how to create a sensible object from a stream of HTML text. This is unfortunate, but not insurmountable. Of the three steps, the first two are still available to us, regardless of the type of the URL’s content. We just have to deal with the stream ourselves, rather than having getContent turn it into a String.
To do the same search and have it work whatever the content type, we have to modify the applet. Listing 7.3 shows those modifications.
Listing 7.3 Searching HTML Documents for a Word
package chap7;
import java.awt.Graphics;
import java.awt.*;
import java.applet.Applet;
import java.net.*;
import java.lang.*;
import java.io.*;
import java.util.*;
/** A class that searches the document specified in the URL
parameter for the word specified in the find parameter.
@author John Rodley
@version 1.0 1/1/1996
@see URL
@see InputStream
*/
public class ch7_fig3 extends Applet {
String stringURL;
String desiredWord;
public final static String buttonLabel =
new String("Start Search");
TextField wordField;
Panel controlPanel;
Button searchButton;
/** Get the parameter "URL" from the applet tag, create a
panel with a text field for entering the word to search for,
and a button for starting the search.
*/
public void init() {
stringURL = new String( getParameter( "URL" ));
if( stringURL == null ) {
System.out.println( "URL parameter not set" );
return;
}
controlPanel = new Panel();
searchButton = new Button( buttonLabel );
controlPanel.add( searchButton );
wordField = new TextField( 25 );
controlPanel.add( wordField);
add( controlPanel );
}
/** Handle the search button, starting a new search
if the start button is pushed.
*/
public boolean handleEvent(Event e) {
if (e.id == Event.ACTION_EVENT) {
if( e.target instanceof Button )
{
Button b = (Button )e.target;
if( b.getLabel().compareTo( buttonLabel ) == 0 )
{
desiredWord = new String( wordField.getText());
dosearch();
}
}
}
return( false );
}
/** Override of Applet.start. Gets the URL and search word
parameters, then opens a stream connection to the document at
that URL and passes the stream and the search word to the
FindWord method.
@see FindTheWord
@see URL
@see InputStream
*/
public void dosearch() {
try {
URL u = new URL(stringURL);
System.out.println( "u="+u );
try {
InputStream is = u.openStream();
FindTheWord( desiredWord, is );
}
catch( IOException e ) {System.out.println( "ioex "+e);}
}
catch( MalformedURLException e1 )
{System.out.println( "mfuex "+e1);}
}
/** Find a word in an input stream, reporting the line number
to the standard output.
@arg find The word we're scanning the input stream for
@arg is An input stream
@see DataInputStream
@see StringTokenizer
@see String
*/
void FindTheWord( String find, InputStream is ) {
int lineNo = 1;
DataInputStream dis = new DataInputStream( is );
while( true ) {
try {
String line = dis.readLine();
if( line == null )
break;
StringTokenizer words = new StringTokenizer( line );
while( words.hasMoreElements()) {
String word = (String)words.nextElement();
if( word.toUpperCase().compareTo( find.toUpperCase())
== 0)
{
showStatus("found "+find+" on line "+lineNo );
System.out.println("found "+find+" on line "+lineNo);
}
}
lineNo++;
} catch( IOException e ) {break;}
}
}
}
We use the same basic structure as in Listing 7.1, getting the
search word from a text field, and the URL from the <applet> tag. The real changes are to our content handling
mechanisms—dosearch and FindTheWord. Within dosearch, we get rid of the call to getContent using the lower level call URL.open
Stream instead.
URL u = new URL(stringURL);
System.out.println( "u="+u );
try {
InputStream is = u.openStream();
This approach gets us the InputStream that getContent uses in Listing 7.1 to create an object appropriate to the URL’s content. You can see now, how high-level the URL.getContent method really is. We could easily write our own version of it using the skeleton of FindTheWord, as shown in Listing 7.4.
Listing 7.4 Our Own Version of URL.getContent
String ourGetContent( InputStream is ) {
String content = new String("");
DataInputStream dis = new DataInputStream( is );
while( true ) {
try {
String line = dis.readLine();
if( line == null )
break;
content = new String(content+line);
} catch( IOException e ) {break;}
}
return( content );
}
The main difference between URL.getContent and our FindTheWord, is that we embed a word search algorithm in FindTheWord. Within FindTheWord we turn the InputStream that dosearch got from our URL into a DataInputStream. This allows us to read the stream line by line rather than in byte or byte arrays, which are all the bare InputStream gives us. This technique is a common theme in Java I/O. The basic I/O object gives you an Input/OutputStream which you then turn into a more specialized stream, like DataInputStream, by passing the bare Input/OutputStream to the specialized stream constructor, as we do here in FindTheWord:
DataInputStream dis = new DataInputStream( is );
while( true ) {
try {
String line = dis.readLine();
From this point, we use the same word search techniques as in Listing 7.1 to turn each line into a Vector of words and eventually compare the search word to each word in the document.
A Link-Checking Applet
Once you have access to the text of HTML documents on the Net, there are an endless number of interesting tasks you can take on. Because of the complex nature of my Web site, one of the most odious tasks I’ve had to deal with is checking all the links in my pages to make sure there aren’t any dead ones. It’s easy to get dead links in a page. A simple typo in the HREF tag will do it.
With a multi-level Web site containing many internal links, you really need an applet that will go through all the links top-to-bottom making sure the documents pointed to actually exist. We can develop an applet like this by combining the applets in Listing 7.1 and 7.3, as shown in Listing 7.5. Figure 7.3 shows the link-checking applet in action.
Listing 7.5 A Link-Checking Applet
package chap7;
import java.awt.Graphics;
import java.awt.*;
import java.applet.Applet;
import java.net.*;
import java.lang.*;
import java.io.*;
import java.util.*;
/** A class that prompts the user for a URL, then goes through
the, presumably HTML, content of that URL checking for links
to other WWW content and making sure that the documents those
links connect to actually exist and that all THEIR links are
valid. This is recursive and potentially time-wasting.
@author John Rodley
@version 1.0 1/1/1996
@see TextField
@see Button
@see Panel
*/
public class ch7_fig5 extends Applet {
public static ch7_fig5 c7;
TextField urlEntryField;
Button checkButton;
Button skipButton;
public List lineList;
Panel topPanel;
public boolean bSkip = false;
String buttonLabel = new String( "Check URL" );
String skipLabel = new String( "Skip" );
/** Set up a text field for entering the URL to check, a button
to start the check, and a button to interrupt an undesired
check.
@see Panel
@see Button
@see TextField
*/
public void init() {
c7 = this;
setLayout( new BorderLayout());
topPanel = new Panel();
urlEntryField = new TextField( "http://", 60 );
urlEntryField.setEditable( true );
topPanel.add( urlEntryField );
checkButton = new Button(buttonLabel);
topPanel.add( checkButton );
skipButton = new Button(skipLabel);
topPanel.add( skipButton );
lineList = new List(10, false);
add( "North", topPanel );
add( "Center", lineList );
resize( 700, 400 );
}
/** Handle the start and skip buttons, starting a new search
if the start button is pushed, and breaking out a scan every
time the skip button is pressed.
*/
public boolean handleEvent(Event e) {
if (e.id == Event.ACTION_EVENT) {
if( e.target instanceof Button )
{
Button b = (Button )e.target;
System.out.println( "button "+b );
if( b.getLabel().compareTo( buttonLabel ) == 0 )
clicked();
if( b.getLabel().compareTo( skipLabel ) == 0 ) {
System.out.println( "skip button clicked" );
bSkip = true;
}
}
}
return( false );
}
/** Actually start the search, getting the URL from the
text field, and setting off a recursive LinkFollower object.
@see LinkFollower
*/
public void clicked() {
System.out.println( "Starting check run" );
String stringURL = urlEntryField.getText();
if( stringURL == null ) {
showStatus( "URL ENTRY FIELD CAN NOT BE EMPTY!!" );
return;
}
if( stringURL.compareTo("" ) == 0 )
{
showStatus( "URL ENTRY FIELD CAN NOT BE EMPTY!!" );
return;
}
LinkFollower lf = new LinkFollower( this, stringURL );
lf.start();
}
}
/** A class that recursively follows all the links in a HTML
page to the very end. This has circularity problems that are
not entirely solved, hence the skip button. A HashTable
contains the list of all links that have been checked, and no
link should be checked twice.
@see HashTable
@author John Rodley
@version 1.0 1/10/1996
*/
class LinkFollower extends Thread {
String stringURL;
ch7_fig3 c;
static Hashtable hash;
String linkStrings[];
/** Constructor - creates the checked-link HashTable, and an
array of "keys" that we use to find links - HREF, IMG, and
applet.
@see HashTable
*/
public LinkFollower( ch7_fig3 ch, String url ) {
hash = new Hashtable();
linkStrings = new String[3];
linkStrings[0] = new String("<A HREF=");
linkStrings[1] = new String("<IMG SRC=");
linkStrings[2] = new String("<applet code=");
c = ch;
stringURL = new String( url );
}
/** The run loop for this LinkFollower thread. This makes the
first call to the recursive method, FollowLinks.
@see FollowLinks
*/
public void run() {
FollowLinks( stringURL );
showOutput( "Check finished" );
}
/** The recursive method that opens a stream from a URL, and
calls FindLinks with that stream as an arg. FindLinks then
calls back to FollowLinks for each found link. For each link,
add a line to the list box in the user interface describing
whether or not the link is valid.
@see URL
@see InputStream
*/
public void FollowLinks( String stringURL ) {
String s;
try {
URL u = new URL(stringURL);
Enumeration en = hash.elements();
for( int i = 0; i < hash.size(); i++ ) {
URL storedU = (URL)en.nextElement();
if( u.sameFile(storedU) == true ) {
s = new String( "already checked -> "+u );
System.out.println( s );
showOutput( s );
return;
}
}
hash.put( u.toString(), u );
try {
try {
InputStream is = u.openStream();
showOutput( "Link to URL OKAY -> "+u );
FindLinks( u.toString(), is );
}
catch( FileNotFoundException e ) {
showOutput( "Link to URL FILE NOT FOUND! -> "+u );
}
} catch( IOException e ) {
showOutput( "Link to URL Error! -> "+u );
}
}
catch( MalformedURLException e1 ) {
showOutput( "Bad URL "+stringURL );
}
showOutput( "FollowLinks("+stringURL+") finished" );
}
/** Given an InputStream, grab each CRLF delimited line and
scan it for links to other URLs. Calls FollowLinks for each
found URL.
@see InputStream
@see DataInputStream
@see FollowLinks
@see StringTokenizer
*/
void FindLinks( String url, InputStream is ) {
int lineNo = 1;
showOutput( "Checking file: "+url );
// First get the base directory of this HTML doc
int index = url.lastIndexOf( "/" );
String baseDir = new String(url.substring( 0, index ));
DataInputStream dis = new DataInputStream( is );
while( true ) {
try {
if( c.bSkip == true ) {
showOutput( "Interrupting "+url );
c.bSkip = false;
break;
}
String line = dis.readLine();
if( line == null )
break;
for( int i = 0; i < linkStrings.length; i++ ) {
int startIndex = 0;
while( true ) {
int ret = line.indexOf(linkStrings[i],startIndex);
if( ret == -1 )
break;
String subLine = new String( line.substring(ret));
StringTokenizer st =
new StringTokenizer(subLine,"<>");
String element = (String)st.nextElement();
st = new StringTokenizer( element, "=" );
st.nextElement();
st = new StringTokenizer( (String)st.nextElement());
String ourLink =
new String( (String )st.nextElement());
int colon = ourLink.indexOf(":");
int first = ourLink.indexOf("\"");
int last = ourLink.lastIndexOf("\"");
if( first != -1 && last != -1 && first != last ){
ourLink =
new String( ourLink.substring(first+1,last));
}
if( colon == -1 ) { // relative URL
char ca[] = new char[1];
ourLink.getChars( 0, 1, ca, 0);
if( ca[0] == '#' ) {
showOutput("skipping name relative link "+ourLink );
startIndex = ret+1;
continue;
}
else
System.out.println("relative url baseDir = "+
baseDir+" ourLink = "+ourLink );
ourLink = new String(baseDir+"/"+ourLink );
}
FollowLinks( ourLink );
startIndex = ret+1;
}
}
lineNo++;
} catch( IOException e ) {break;}
}
showOutput( "Finished checking file: "+url );
}
void showOutput( String s ) {
c.lineList.addItem( s );
System.out.println( s );
}
}
To make this applet check the links on a Web site, we enter the URL of a Web page in the text field, then press the start button to start it checking links. When the user presses the start button, the applet creates a LinkFollower object passing the Applet and String URL to the constructor. Once running (via Thread.start), the new LinkFollower object goes through the following steps:
1. Connect to the URL specified in the text field.
2. Download the document found there.
3. Scan the document InputStream for any of the strings that indicate a hyperlink.
4. If it finds one:
Pull the target of the hyperlink from the text.
Turn that hyperlink target into a URL.
Go back to step 2, using the new URL.
As you can see, this is clearly recursive. FindLinks calls FollowLinks, which calls back to FindLinks.
In parsing the HTML, we use the same basic technique we used in
the word searches of Listing 7.1 and 7.3. Instead of the single search word, we
look for any of three strings that indicate hyperlinks. These specific strings
are defined in the linkStrings
array, which we create in the LinkFollower
constructor shown here:
linkStrings = new String[3];
linkStrings[0] = new String("<A HREF=");
linkStrings[1] = new String("<IMG SRC=");
linkStrings[2] = new String("<applet code=");
This brute force parsing works surprisingly well, although it’s a far cry from the kind of rigorous syntax checking a commercial product would need to do.
One of the biggest problems in writing an applet like this is a by-product of the structure of the Web itself—circular references. In the most basic case, if you have two pages that contain links to one another, an unsophisticated Web crawler will sit spinning in an endless loop.
Our checker takes a number of steps to try to prevent this. One is to store the URL of each site we visit, so that we never visit any page more than once. That’s the purpose of the HashTable hash. In FollowLinks, we check each element of the HashTable against the URL we’re about to check as follows:
URL u = new URL(stringURL);
Enumeration en = hash.elements();
for( int i = 0; i < hash.size(); i++ ) {
URL storedU = (URL)en.nextElement();
if( u.sameFile(storedU) == true ) {
s = new String( "already checked -> "+u );
System.out.println( s );
showOutput( s );
return;
}
}
hash.put( u.toString(), u );
Now we could have just used HashTable.contains to see if this URL was already there. However, there are a number of different URLs that describe the same file. We try to cover this by calling URL.sameFile.
We also have to have a mechanism for breaking out of any unwanted branches in the Web site hyperlink structure. In my own Web site, I have links to both java.sun.com and microsoft.com. I certainly don’t want to check the links to those Web sites along with my own. The simplest way to do that is to provide a skip button, that breaks you out of the lowest level of link-checking—the loop in FindLinks where lines are read and parsed.
The skip button is created in the init method, and as you would expect, clicking it causes an event that gets passed to handleEvent. When we detect a skip button press in handleEvent, we simply set the boolean, bSkip. The LinkFollower runs asynchronously in its own thread. When bSkip is set, the LinkFollower could be executing anywhere in the run, FollowLinks, or FindLinks methods, but more than likely it will be down in the for loop of FindLinks. Thus, we let FindLinks finish dealing with whatever line it’s on, then check bSkip before starting the next line. If bSkip is set, we clear it and skip the rest of this document.
Using Sockets
Sockets are a form of interprocess communication that allows
processes on different network hosts to communicate. They originated with
Berkeley Unix and have spread to become the defacto standard for Internet
communication. A form of sockets, Winsock, has also taken hold in the Windows
world to the point that most Internet-capable Windows applications conform to
some
version of Winsock.
Sockets are actually an interface—a set of function calls that your application can call and be guaranteed a particular response. Each operating system that supports sockets implements them in its own way, but all present the same interface to applications that wish to use those sockets. Thus, socket libraries in both the System V and BSD versions of Unix provide a function called gethostname, though each implements it differently.
Socket libraries generally consist of about two dozen functions, but there are really only a few functions you need to understand to get going with sockets. Table 7.1 lists the key socket functions.
Two connected sockets make a point-to-point communications channel. Each side of the conversation creates a socket (via socket). The server side binds to a host name and port number, listens for connections, and accepts them as they occur. The client side simply connects to the hostname and portnumber. When the connect returns, the two sides can then call send and recv to read and write the connection.
Sockets are at the heart of almost every instance of Internet communication. When, for instance, you point your Web browser at http://java.sun.com, the browser uses the socket interface to connect to a port on java.sun.com. Sockets are a simple, old, tried-and-true technology that make network programming fairly easy.
Socket Basics
There are two ends to each Java socket conversation: server and client. The server end is embodied in the ServerSocket class, while the client end is embodied in Socket. These two ends go through a specific set of steps to setup, conduct, and terminate a conversation, as shown here:
1. The server instantiates ServerSocket passing a local port number. This creates the socket and binds it to that local port number:
ServerSocket ServerS = new ServerSocket( 1037 );
2. We accept connections to this server socket by calling accept.
Socket AcceptedS = ServerS.accept();
3. The client instantiates Socket, passing a server name and port number. This creates a client end socket and connects it to the named port, on the named host. When this call returns, the two ends are connected.
Socket ClientSocket = new Socket( "www.mymachine.com", 1037 );
4. ServerSocket.accept returns a Socket, that can now be used for I/O. Most applications will spawn a new thread to read and write this socket. The ServerSocket can continue to “accept” connections on the original socket.
byte b[] = new byte[100];
AcceptedS.getOutputStream().write( b );
From this point on, both server app and client applet can read and write the connected sockets. This is a compression/simplification of the steps C programs using the socket interface would go through. On the server side, Java’s ServerSocket class compresses the socket creation, address binding, and listen calls into the constructor. On the client side, the Socket class compresses socket creation and connect into the constructor.
How does one side know when the conversation is over? Many protocols call for there to be a “goodbye,” but depending on something like that won’t get you very far. Lost connections are a fact of life. Fortunately, Java throws an IOException in almost any case where the network connection has been interrupted. You must catch and handle IOExceptions properly to write usable network communications code.
The Snitcher Applet
With those basic ideas well in hand, let’s construct an application/applet combo that does some very simple socket communication. The purpose of this combination is to record the date/time, URL, and IP address of the user whenever someone accesses the HTML page in which this applet is embedded. This is one of the holy grails of Web publishers: to be able to know who is hitting their page and when. This applet, Snitcher, is unusual in that it has no user interface. The user never sees it.
The theory behind the system is simple enough. The server Java application (Snitch), is running all the time on the server accepting connections on port 1038. When a user loads the page with our applet in it, the applet starts up, gets the page URL, the host name, and IP address, and packages all that information in a message. Then it connects to the server socket and sends the message to the server, which stores it in a file from where it can be retrieved and analyzed. Listing 7.6 shows the Snitcher applet.
Listing 7.6 The Snitcher Applet
package chap7;
import java.awt.Graphics;
import java.awt.*;
import java.applet.Applet;
import java.net.*;
import java.lang.*;
import java.io.*;
import java.util.*;
/** An applet that reports the hostname and IP address of the
machine reading the HTML page back to the server from which
the HTML page was loaded.
@author John Rodley
@version 1.0 12/1/11996
*/
public class ch7_fig6 extends Applet {
boolean bAlreadyRan = false;
int port = 1038;
/** Resize the applet to almost nothing, and change the port
number that the applet will connect to, if the port parameter
is set in the applet tag.
*/
public void init() {
String sPort = getParameter( "port" );
if( sPort != null ) {
Integer iPort = new Integer( sPort );
port = iPort.intValue();
}
resize( 10, 10 );
}
/** Check if the snitcher has already informed on this user and only contact the server if we haven't run yet. Tries to guarantee that
we only get one report for each time the page is loaded.
*/
public void start() {
if( bAlreadyRan == false ) {
snitch();
bAlreadyRan = true;
}
}
/** Report the hostname and IP address of this machine to the
server.
@see InetAddress
@see URL
@see PrintStream
@see Socket
@see Snitch
*/
void snitch() {
// Get the local hostname and IP address
String sIpaddr = "Unknown ipaddr";
try {
InetAddress in = InetAddress.getLocalHost();
sIpaddr = in.toString();
} catch( UnknownHostException e )
{System.out.println("exception "+e );}
// Now get the URL of the HTML page we're running
URL u = getDocumentBase();
String sHost = new String( u.getHost());
try {
String snitchInfo = new String( u+" :::: "+sIpaddr);
System.out.println( "reporting snitchinfo "+snitchInfo );
Socket s = new Socket( sHost, port );
PrintStream p = new PrintStream( s.getOutputStream());
p.println( snitchInfo );
s.close();
} catch( IOException e )
{System.out.println( "ioexception "+e ); }
}
}
This applet is deceptively simple. Let’s look at it in detail.
The applet needs to create a Socket that is connected to a ServerSocket. This means that we need to know the name of the server host and the port number it’s accepting connections on. The port number is easy. We set that via a parameter in the <applet> HTML tag. The host name is a little trickier. As we’ve stated before (and will again), a security feature of some browsers requires that the server application run on the same host that the client applet is loaded from. This represents the “least-common-denominator” in network communication. Thus, we can get the name of the host simply by getting the URL of the HTML page, and then pulling the hostname from that, as shown here:
// Now get the URL of the HTML page we're running
URL u = getDocumentBase();
String sHost = new String( u.getHost());
getDocumentBase, an Applet method, returns the URL of the
HTML document and URL.getHost gives
the String version of the host to
which the URL points.
Using InetAddress
Now that we’re all set up to communicate with the server, we need to get the IP address and host name of the machine the browser (and the Snitcher applet) is running on so that we can create the message we’ll actually send to the server. To do that, we need to use the InetAddress class. InetAddress is the interface between Java and the network name service. You can use it to turn a host name into an IP address or vice versa. The Snitcher applet uses the static method InetAddress.getLocalHost to get a complete description of the machine that the applet is running on. Notice that all we need do to get the hostname/IP address is call InetAddress.toString. This is a recurring theme in Java. It is also what happens if you append a non-String object to a String via the + operator as in:
InetAddress in = InetAddress.getLocalHost();
String s = "blah blah blah"+in;
Java calls in.toString in order to append it to the first string. Since we know where to connect, and what we want to say, the network communication boils down to four lines in the snitch method:
Socket s = new Socket( sHost, port );
PrintStream p = new PrintStream( s.getOutputStream());
p.println( snitchInfo );
s.close();
As in other I/O examples, we take a bare OutputStream returned by Socket.getOutputStream, turn it into a more capable Stream—in this case, a PrintStream just like System.out—and use that new Stream to write a String to the Socket.
The Snitch Application
That covers the client Snitcher applet, but we still need a server Snitch application for the Snitcher applet to talk to. Listing 7.7 shows the server Snitch application.
Listing 7.7 Standalone Server Snitch Application
package chap7;
import java.awt.*;
import java.lang.*;
import java.util.*;
import java.net.*;
import java.io.*;
/** A standalone socket connection server that simply writes
everything it receives over the socket connection to a
day file. When the date changes, the server opens a new file.
The intent is that applets will connect, report the HTML
page's URL, date/time and the clients ip host name and IP
address allowing Web page owner to know who hits his
page and when.
@version 1.0
@author John Rodley
@see ch7_fig4
*/
public class Snitch extends Thread {
public static ServerFrame f;
static public boolean bRun = true;
static Panel p;
MenuBar m;
SrvSocket s;
Acceptor acceptor;
public static Snitch currentSnitch;
String filename = "Report.web";
PrintStream ps;
/** The main function for this standalone application.
Corresponds directly to the main function in a C application.
@param argv The arguments to this application. Currently
takes none.
*/
public static void main(String argv[] ) {
Snitch as = new Snitch();
Properties p = System.getProperties();
try {
p.load(
new FileInputStream("/users/default/.hotjava/properties"));
} catch( IOException e ) {System.out.println("except "+e ); }
System.out.println( "system properties "+p );
String topDirectory = System.getProperty( "acl.read" );
if( topDirectory == null ) {
System.out.println( "can't read this machine" );
}
else
System.out.println( "got "+topDirectory+" for acl.read");
as.start();
}
/** Constructor. Creates a unique file via switchFiles for
logging, an acceptor thread for accepting connections on the
port, and a main window for user interaction. Currently just
runs, and exits on command.
@see switchFiles
@see ServerFrame
@see Acceptor
@see awt.Frame
@see awt.MenuBar
@see awt.Panel
@see awt.Layout
@see awt.Menu
@see awt.MenuItem
*/
public Snitch() {
switchFiles();
currentSnitch = this;
f = new ServerFrame();
f.resize(300, 300);
f.show();
p = new Panel();
p.reshape( 0, 0, 300, 300 );
p.setLayout( new FlowLayout());
f.add( p );
m = new MenuBar();
f.setMenuBar( m );
Menu m1 = new Menu("File");
m.add(m1);
MenuItem m2 = new MenuItem( "Exit" );
m1.add( m2 );
acceptor = new Acceptor( this );
acceptor.start();
}
/** Reports a line of text received over the socket connection
to the unique log file created by switchFiles. Synchronized
so that entire entries are written as one lump.
@see Date
@see PrintStream
@see OutputStream
*/
public synchronized void Report( String msg ) {
if( ps == null )
return;
Date d = new Date();
ps.println( new String(d+" :::: "+msg) );
}
/** Create a log file with a unique name formatted as:
"M" The letter M
mm One or two digit month 1-12
"D" The letter D
dd One or two digit day of month
"Y" The letter Y
yy One or two digit year offset from 1900
".w" Dot and letter w
hh One or two digit hour
mm One or two digit minute
This gives us a file that's guaranteed to be unique both to
the day, and within the day so that the server can be stopped
and restarted within a day.
@see Date
@see File
@see FileOutputStream
@see PrintStream
*/
public void switchFiles() {
Date d = new Date();
filename = new String( "M"+(d.getMonth()+1)+"D"+d.getDate()+
"Y"+d.getYear()+".w"+d.getHours()+""+d.getMinutes() );
try {
File fi = new File( filename );
ps = new PrintStream( new FileOutputStream(fi) );
} catch( IOException e )
{ System.out.println( "ioexception e "+e );}
}
/** The run loop for the the snitcher thread. Wakes up once
per second and checks the date to see if we should switch log
files. This is far too often for the date checking, but
message processing is on hold while we sleep. Thus, if we
change the sleep time to 1 minute, when the user closes
Snitch, it sits for a whole minute before closing the app -
unacceptable.
@see switchFiles
@see Date
@see Acceptor
*/
public void run() {
boolean bLast = false;
Date d = new Date();
int lastday;
int today;
today = d.getDay();
lastday = today;
f.setTitle( "Snitch" );
while( bRun == true ) {
d = new Date();
today = d.getDay();
if( today != lastday )
switchFiles();
lastday = today;
// Wake up once per minute and check the time
try {Thread.sleep( 1000 );} catch( Exception e ) { }
}
acceptor.stop();
System.out.println( "out of run loop" );
f.dispose();
System.exit(0);
}
}
/** The frame window for this standalone application. Exists
only to provide a way to kill the server. Handles kill via
the system menu and the file menu.
@author John Rodley
@version 1.0
*/
class ServerFrame extends Frame {
/** Handle close from the system menu.
*/
public synchronized boolean handleEvent(Event evt) {
if( evt.id == Event.MOUSE_UP ) {
return( true );
}
else
{
if( evt.target instanceof Frame ) {
if( evt.id == Event.WINDOW_DESTROY ) {
Snitch.currentSnitch.bRun = false;
System.out.println( "window destroy "+evt );
return( true );
}
else
return super.handleEvent(evt);
}
else
return super.handleEvent(evt);
}
}
/** Handle exit from the file menu.
*/
public boolean action( Event evt, Object o ) {
if( evt.target instanceof MenuItem )
{
if( evt.arg.toString().compareTo( "Exit" ) == 0 )
{
Snitch.currentSnitch.bRun = false;
System.out.println( "action event "+evt );
}
else
{
}
}
return( true );
}
}
/** Handle reading and closing a socket which has already been
accepted.
@see Report
@see Thread
@see AcceptedSocket
@author John Rodley
@version 1.0
*/
class SocketHandler extends Thread {
public AcceptedSocket as;
FileOutputStream outputFile;
boolean bDispatcher = false;
boolean bContinue = true;
/** Simply saves the Socket that's passed as an argument.
@arg Socket This socket is saved and used within the run method to
read from.
@see AcceptedSocket
@see Socket
*/
public SocketHandler( Socket so ) {
as = new AcceptedSocket( so );
}
/** The run loop for this thread. Does a single blocking read
from the Socket that was supplied to the constructor for a
maximum of 1024 bytes and then closes the socket and exits the
thread. Passes whatever is read to Report for logging in the
day file. The small, single read is done for security
purposes. A malicious app could still flood the log, but it
would have to re-connect every time—an expensive and
dangerous proposition.
@see Report
*/
public void run() {
int ret;
byte buffer[] = new byte[1024];
if(( ret = as.readLine( buffer )) != 0 )
{
Snitch.currentSnitch.Report( new String(buffer,0,0,ret ));
System.out.println( "read "+ new String(buffer,0,0,ret));
}
as.close();
}
}
/** A thread that simply sits in a loop accepting connections
on the port and spawning other threads to read the accepted
socket.
@author John Rodley
@version 1.0
@see SrvSocket
@see SocketHandler
@see Snitch
*/
class Acceptor extends Thread {
Snitch as;
SrvSocket s;
/** Constructor - daemonize this thread and save the Snitch
for later use.
*/
public Acceptor( Snitch a ) {
setDaemon( true );
as = a;
}
/** The run loop for this thread. Sits in a loop accepting
connections. Whenever a client connects, we create a
SocketHandler thread using that accepted Socket and start the
thread up. Runs until "stopped" from above.
@see Socket
@see SrvSocket
@see SocketHandler
*/
public void run() {
// set up the server socket
s = new SrvSocket( 1038 );
while( true ) {
Socket newS = s.Accept();
SocketHandler a = new SocketHandler( newS );
a.start();
}
}
}
/** Class representing a server socket bound to a local port.
@see ServerSocket
@version 1.0 August 1, 1995
@author John Rodley
*/
class SrvSocket {
ServerSocket s;
Socket newS;
/** Constructor creates a ServerSocket bound to a local port.
@arg port The integer local port number that this socket will
be bound to.
@see ServerSocket
*/
public SrvSocket( int port ) {
s = null;
while( s == null ) {
System.out.println( "Accepting on host port:"+port );
try {
s = new ServerSocket( port );
} catch( IOException e )
{ System.out.println( "exception "+e ); }
}
}
/** Accept a connection on this port and return the new
socket. Swallow any exceptions.
@see Socket
@see Socket.accept
*/
public Socket Accept() {
try {
newS = s.accept();
System.out.println( "Accepted on host port" );
} catch( IOException e )
{ System.out.println( "exception "+e ); }
return( newS );
}
}
/** A socket that has been accepted, meaning that there is a
client now attached to it.
@see InputStream
@see OutputStream
@author John Rodley
@version 1.0
*/
class AcceptedSocket {
public InputStream inputStream;
public OutputStream outputStream;
public DataInputStream dis;
Socket s;
/** Constructor - creates input and output streams that read
and write can use.
@arg so The accepted Socket, saved for further use.
@see InputStream
@see OutputStream
*/
public AcceptedSocket( Socket so ) {
s = so;
try {
inputStream = s.getInputStream();
outputStream = s.getOutputStream();
} catch( IOException e )
{ System.out.println( "exception "+e); }
}
/** Read a line terminated by one of the usual suspects - \r
and/or \n. Accomplish this by making a DataInputStream from
our base InputStream.
@see DataInputStream
*/
public int readLine( byte buffer[] ) {
int ret = -1;
String s = new String("");
try {
dis = new DataInputStream(inputStream);
s = dis.readLine();
s.getBytes( 0, s.length(), buffer, 0 );
} catch( IOException e )
{ System.out.println("exception "+e); return( -1 );}
return( s.length());
}
/** Read an array of bytes from the socket.
@return The number of bytes read.
*/
public int read(byte buffer[], int length) {
try {
return( inputStream.read(buffer));
} catch( IOException e )
{ System.out.println("exception "+e); return( -1 ); }
}
/** Write an array of bytes to the socket. */
public void write(byte buffer[], int length) {
try {
outputStream.write(buffer, 0, length);
} catch( IOException e )
{ System.out.println( "exception "+e); }
}
/** Close the socket. */
public void close() {
try {
s.close();
} catch( IOException e )
{ System.out.println( "exception "+e); }
}
}
The starting point for any standalone application is the main method. Snitch’s main method accomplishes the following tasks:
• Loads a set of “properties” into the System’s properties list
• Gets the path of the
directory in which the day file will be created by query
ing the property “acl.read”
• Instantiates the Snitch class
• Starts the new Snitch instance by calling Thread.start
A close look at Snitch.java
reveals a basic skeleton that all server applications follow. The top level
thread does almost nothing except create a Frame
object (window) for accepting user input, and create another thread to accept
connections to a ServerSocket. This
is what happens in the Snitch constructor. We create our frame window and
populate it with child windows, in this case a menu and some menu items. We
also create an acceptor thread, and set
it running.
The Acceptor class merely creates a ServerSocket on the local host bound to port number 1038. Each time a client applet connects to this server socket, the acceptor thread creates a new thread to read the port and deal with whatever the client sends us, in this case, writing a line of text to the day-file.
Like the File class, the Socket class provides two methods, getInputStream and getOutputStream, that provide a base object through which we can do whatever style of I/O we wish. The AgentServer, for the most part, does non-delimited, byte-level I/O using the bare InputStream. Snitch, on the other hand, receives CRLF-delimited lines of text from its client applets. Thus, it needs to create a DataInputStream from the Socket’s bare InputStream, and use DataInputStream.readLine rather than InputStream.read. You will find that almost all I/O operates this way. You take a bare InputStream or OutputStream, then create a more sophisticated, higher-level stream, like DataInputStream, using that bare stream.
The other big difference between Snitch and more complicated
server
applications like AgentServer is that we do only a single readLine before closing the socket; most servers keep reading the
socket until the client applet
disconnects.
File I/O
Something in the server Snitch application that we haven’t seen before is file I/O via the File class. File I/O is not useful to applets because the browser SecurityManagers generally do not allow applets to use it. Period. When you’re writing a Java standalone application like Snitch, on the other hand, you’re free to do whatever I/O you might want. Snitch’s use of file I/O is limited to:
• Creating a new day file
• Writing whatever lines come over the Socket into that day file
• Creating the new day file is embodied in the switchFiles method
public void switchFiles() {
Date d = new Date();
filename = new String( "M"+(d.getMonth()+1)+"D"+d.getDate()+
"Y"+d.getYear()+".w"+d.getHours()+""+d.getMinutes() );
try {
File fi = new File( filename );
ps = new PrintStream( new FileOutputStream(fi) );
} catch( IOException e )
{ System.out.println( "ioexception e "+e );}
}
The point of switchFiles is to create a file that will be unique and have a name that will indicate what date it is associated with. In normal operation, the system would create one day file for each day. Since the day file name also contains an hour/minute indicator, you can stop and restart the server within a day and end up with two day files for one day. As with all I/O, we create the basic I/O object, get a base Stream object (in this case, a FileOutputStream) and create a higher-level Stream (in this case, a PrintStream) to do our actual I/O against.
When does switchFiles get called? Well, it gets called once at startup. What happens in normal running then is that the main loop of Snitch simply sleeps for a second, then wakes up and checks the time. If we’ve rolled past midnight, the main loop calls switchFiles to create a new day file.
To Block, or Not to Block
One of the key characteristics of any I/O operation is whether or not it blocks. If you call InputStream.readLine, no matter how many bytes it does read, it will not return until it reads a line terminator. It “blocks” until the line terminator is read. If readLine were non-blocking, it would return immediately whether or not it had read a line terminator.
Many coders, especially those who grew up in the bad old days of single threading, prefer to write their communication code as non-blocking. In single-threaded systems, there are very good reasons for this, one being that code that blocks, often fails to unblock.
That reasoning doesn’t hold up in Java. Any thread that has blocked should be able to be unblocked by calling Thread.stop (throwing a ThreadDeath at it). Java also doesn’t support many of the system calls (select and available, to name two) that Unix coders used to rely on to write non-blocking I/O.
Almost all Java I/O calls block, including socket connect, stream read, and stream write. Some allow the operation to timeout, but for the most part, you are literally required to thread and block.
Conclusion
Java makes network communication easy, through a set of simple classes—URL, ServerSocket, Socket, and InetAddress—that abstract the important concepts in Internetworking. While URLs provide some high-level functionality through getContent, you can easily program right down to the lowest levels using the Socket class.
Using these basic tools, we can easily construct functional systems of cooperating Java objects. While security restrictions often force some inelegance in the design this goal can still be achieved with modular, portable, and fairly readable implementations.