Java Internet Programming, Part 3

Thornton Rose

Published as "Java Internet Programming: Level 3", 11/23/99, Gamelan.com.

This is the final installment in a three-part series on writing Java programs that use and implement Internet services. In part 1 and part 2, I covered protocols, ports, sockets, and UDP, and showed both client and server programs. There is so much that I would like to write about from here, but I don't have the space. So, I am going to illustrate two things that hopefully will be interesting: a POP3 client and an HTTP server.

Tools

The tools that you will need for this article are:

The Java Development Kit (JDK)
Your favorite text editor.

Also, you will need access to a mail server that supports Post Office Protocol Version 3 (POP3). Most likely, you can use the mail server at your Internet service provider (ISP), your company, or your school.

POP3

Post Office Protocol Version 3 (POP3) is a high-level Net protocol that "is intended to permit a workstation to dynamically access a maildrop on a server host in a useful fashion" [1]. In other words, it is a protocol designed to allow a client program to retrieve mail that is being held at a server.

The POP3 specification (RFC1939) is straightforward; but, like most RFCs, it is not what I would call lunchtime reading. The basic operation of POP3 is:

The server listens for connections on port 110.
A client connects and authenticates.
The client sends commands; the server processes the commands and sends responses back to the client, until the client disconnects or aborts.

POP3 commands are single lines composed of a keyword, followed by one or more parameters, followed by a carriage return plus linefeed (CRLF). Responses from the server may be one or more lines. For either, the first line is composed of the status (+OK or -ERR), followed by additional information, followed by CRLF. For a multi-line response, the last line is composed of a period (".") followed by CRLF.

Here is a synopsis of the POP3 commands:

COMMAND	DESCRIPTION
STAT	Get status, referred to as a "drop listing" — number of messages waiting and size of maildrop in octets.
LIST [msg]	Get "scan listing" — message number and size in octects — for all messages or specified message (e.g., 1, 2, 3).
RETR msg	Retrieve specified message.
DELE msg	Mark specified message for deletion.
NOOP	Do nothing.
RSET	Unmark any messages that were marked for deletion.
QUIT	End the session.
TOP msg n	Retrieve header and first n lines of specified message.
UIDL [msg]	Get the unique identifier of all messages or the specified message.
USER name	Identify mailbox (username) to access.
PASS pw	Send password for mailbox specified by USER command. (Note: The password is sent in clear text.)
APOP name digest	Send mailbox (username) and MD5 digest string (similar to an encryption key). This command can be used instead of USER + PASS to authenticate, so that the password does not get sent in clear text.

MailStat

MailStat is a program that illustrates the basics of using POP3. It is a simple program that checks for mail on a given server. Here is the algorithm:

Get the command-line parameters - host, username, and password. If they are not specified, print a message and exit.
Get the host's IP address.
Open a socket to the host.
Get references to the socket input and output streams.
Read the reply from the server.
Send the username and read the reply.
Send the password and read the reply.
Get the status (number of messages, mailbox size).
End the session.
Close the socket and exit.

I could have done more with MailStat, but I wanted to keep it simple, so that "features" did not get in the way of illustrating POP3. It was meant to be a stepping stone. Some suggestions for making it more useful are:

Check for mail every x minutes
Check more than one mailbox
A graphical user interface

HTTP

HyperText Transport Protocol (HTTP), another high-level Net protocol, "is the standard protocol for communication between web browsers and web servers" [2]. The specification for HTTP 1.1 can be found in RFC 2616, and the specification for HTTP 1.0 can be found in RFC 1945.

The basic operation of HTTP is:

The server listens for connections on port 80.
A client connects and sends a request.
The server sends a response, which includes the contents of a file if it was requested.
The connection is closed by the client, the server, or both.

The general form of a client request is:

{command} /{url} {HTTP-version}CRLF
[{keyword}]: {value}CRLF
...

{command}	=	The command that the server should process: GET - retrieve file, HEAD - retrieve file header, POST - send form data, or PUT - upload a file.
{url}	=	The relative URL of a file on the server.
{HTTP-version}	=	The version of HTTP that the client understands: HTTP/1.0, HTTP/1.1, etc.
{keyword}	=	Keyword to identify extra information that is being provided to the server. Some common keywords are Accept - specify what data can be handled, and User-Agent - indentify browser.
CRLF	=	Carriage return (ASCII 13) + line feed (ASCII 10)

Here are some examples:

GET /foo.html HTTP/1.1

GET /foo.html HTTP/1.1

   Accept: text/html

   Accept: text/plain

   Accept: image/gif

   Accept: image/jpg

   User-Agent: Netscape/4.5

The general form of a server response is:

{HTTP-version} {response-code}CRLF
Server: {server-identity}CRLF
MIME-version: {MIME-version}CRLF
Content-type: {content-type}CRLF
Content-length: {content-length}CRLF
CRLF
{data}

{HTTP-version}	=	The HTTP version that is being used by the server.
{response-code}	=	The response code, which has two parts: a response number and a message. The most common response codes are "200 OK" and "404 Not Found". Response codes 200-299 indicate success, 300-399 indicate redirection, 400-499 indicate a client error, and 500-599 indicate a server error.
{server-identity}	=	The server identity.
{MIME-version}	=	The version of MIME that is being used by the server.
{content-type}	=	The MIME type of the content data: text/html, image/gif, etc.
{content-length}	=	The length of the content data in bytes.
{data}	=	The content data.
CRLF	=	Carriage return (ASCII 13) + line feed (ASCII 10)

Here is an example:

HTTP/1.1 200 OK

   Server: NCSA/1.4.2

   MIME-version: 1.0

   Content-type: text/html

   Content-length: 64

   

   <html>

   <head><title>Foo</title></head>

   <body>Foo</body>

   </html>

HttpServer

HttpServer is a very simple HTTP (web) server. Note that, like MailStat, HttpServer is lean. It is meant to be a starting point that illustrates HTTP. I implemented it with only two classes, and it supports only the GET command for HTML, text, GIF, and JPEG files. Here is the algorithm:

Get the command-line parameters: port. If the port is not specified, set it to 80.
Open a server socket on the specified port.
Begin loop.
1. Wait for client connection, getting a reference to the client socket when the connection is received.
2. Create and start a new client request thread, passing it the client socket.
End loop.
Close the server socket and exit.

HttpRequestThread handles the client requests. Here is the algorithm:

Summary

This is the end of the line for this series of articles. Hopefully, my example programs were interesting and provided you with good starting points for your own Internet programming in Java.

References

"RFC 1939, Post Office Protocol - Version 3 (POP3)". Myers, J., Carnegie Mellon; Rose, M., Dover Beach Consulting, Inc., May 1996.
Java Network Programming. Harold, Elliotte Rusty. Copyright 1997, O'Reilly & Associates, Inc.