CSCI 3325
Distributed Systems

Bowdoin College
Spring 2022
Instructor: Sean Barker

Project 2 - Online Bookstore

Assigned:Tuesday, February 22.
Groups Due:Thursday, February 24, 11:59 pm.
Code Due Date:Sunday, March 6, 11:59 pm.
Writeup Due Date:48 hours after code due.

In this project, you will design and implement a multi-tier distributed system using remote procedure calls (RPCs) as the core distributed programming model. You will also experiment with your system to see how query loads affect performance.

This project should be done in teams of two or three (unless otherwise cleared with me). All team members are expected to work on all parts of the project.

System Specification

You have been tasked to design Nile.com: the world's smallest online bookstore. Since Nile.com plans to one day overtake Amazon, you will need to follow sound design principles in building the online store to allow for future growth.

The store will employ a two tier design: a front-end and a back-end. The front-end tier will accept user requests (i.e., communicate with clients) and perform initial processing. The backend consists of two components: a catalog server and an order server. The catalog server maintains information on current inventory, while the order server is responsible for keeping the catalog appropriately updated.

A pictorial representation of the system is shown in the figure below.

architecture

Bookstore inventory is tracked within the system as a series of book entries, where each book entry consists of the following:

  1. an item number (e.g., 793)
  2. the book name (e.g., "Socket Programming for Dummies")
  3. the book topic/category (e.g., "self-help")
  4. the number of copies of the book in stock (e.g., 25)

The various components of the system will interact with the inventory via a set of operations, as detailed below.

Server Operations

The front end server should support three operations (which are invoked by clients):

The search and lookup operations trigger queries to the catalog server, while the buy operation triggers a request to the order server.

The catalog server should support two operations:

The order server should support a single operation:

Data Requirements

Initial inventory stock may be determined arbitrarily. New stock should arrive automatically and periodically (e.g., every 30 seconds a new copy of each book is added; the details of this are up to you). The actual set of books (and topics) contained in the catalog is also up to you. While your design and data structures should be flexible enough to handle any catalog, the objective here is not to actually manage a large dataset, so a small number of books (e.g., 10 books spread over 3 topics) is sufficient. In a real service, the catalog would likely be stored in a full-fledged database server or similar. Here, you do not need to store the catalog on disk at all; keeping the catalog data as an in-memory data structure in the catalog server is fine.

Technical Requirements

All servers should be written in Java using XML-RPC for remote communication. In addition to the three server programs, you will write two clients - one in Java and one in Python. Both clients should communicate with the same Java front-end server program using XML-RPC.

No GUIs are required for any servers or clients. Simple command line interfaces are fine (e.g., for issuing requests via a client). The exact format of the client interface is up to you, but should include typical information and operate in an intuitive way (e.g., showing the results of a query in a nicely formatted manner, printing whether purchases requests succeeded or failed, etc).

The service must support multiple concurrent queries; i.e., mutliple clients should be able to have their queries serviced without having to wait for other queries to complete first. In contrast, purchases do not need to support concurrency; while multiple clients may of course issue purchase requests simultaneously, the order server does not need to actually perform them concurrently.

Be aware of thread synchronization issues to avoid inconsistency or errors in your system. For instance, two concurrent purchase requests should not both be able to buy a single remaining copy of a book. You may find Java's synchronization primitives helpful here (described more below).

Some of the programs will need to be given one or more hostnames when started in order to connect to the rest of the system. These hostnames should be provided as command-line arguments to your programs. In particular:

Note that the above dictates the order in which the system should be launched: first the catalog server, then the order server, then the front-end server, and finally one or more clients.

Implementation Advice

Example XML-RPC Code

To help you get started with XML-RPC, an example server and two clients are provided below (which are also included in the starter code):

As a place to start, try running the server and clients on the class server and make sure you understand the basic structure of the code.

Accessing the XML-RPC Java Classes

To compile your Java files, you will need to include the Apache XML-RPC Java classes in your "classpath", which is basically the set of directories that Java uses to search for included classes. To make things easier for you, the starter code includes a functional Makefile that will invoke javac with the right classpath arguments to compile your code. The starter code includes Java files for the three server programs and the client program, as well as a Python skeleton for the second client. You will likely need to write additional Java classes, but you should not need to write any other classes containing main methods that will be executed directly.

While the Makefile will take care of the classpath when compiling with javac, you also need to specify the classpath when executing the code with java. There are two ways to do this. The simple but cumbersome method is to specify the classpath as a command-line argument every time you run your code, such as below (all on one line, and the same for every class except for the class name to execute):

java -cp /usr/share/java/xmlrpc-client.jar:/usr/share/java/xmlrpc-server.jar:/usr/share/java/xmlrpc-common.jar:/usr/share/java/ws-commons-util.jar:/usr/share/java/apache-commons-logging.jar:. CatalogServer

A better option is to set your classpath by adding the following line to your .bashrc file (note the leading period) in your home directory on the class server (use this exact text, which should all be on one line):

export CLASSPATH="/usr/share/java/xmlrpc-client.jar:/usr/share/java/xmlrpc-server.jar:/usr/share/java/xmlrpc-common.jar:/usr/share/java/ws-commons-util.jar:/usr/share/java/apache-commons-logging.jar:.:$CLASSPATH"

Once you've done this, every new shell process will have the classpath automatically set up, and you should be able to run your code using java without having to specify the classpath.

For the Python client, note that you must be using Python 3, but simply calling python on turing defaults to v2. To run Python 3, use the python3 command instead (or, you can execute ./client.py directly). Whether python defaults to v2 or v3 may differ on another machine; you can run python --version to check.

If you want to develop in Java on your local machine, you may need to download the XML-RPC classes. You will need to add the jarfiles to your local or IDE classpath in order to use them (but note that the paths may be different from those specified in the provided Makefile).

Documentation on the XML-RPC classes (e.g. Javadoc) is available on the Apache XML-RPC site.

Stateless Servers

XML-RPC is an example of a stateless protocol, which means that no information (or state) is automatically maintained across multiple communication calls. In other words, every XML-RPC request completely stands alone and is not intrinsically related to any other request. Other examples of stateless protocols include HTTP and IP, while TCP (in contrast) is an example of a stateful protocol (since TCP messages are inherently part of a stream of messages between two hosts, and this information must be tracked across messages as part of the protocol).

The main practical implication of the statelessness of XML-RPC is that the server will create a new server object (e.g., an instance of the Server class in the example code) for every new RPC request. This approach simplifies XML-RPC itself and makes it easy to handle lots of simultaneous requests, but may complicate your program design a bit to accommodate this. You can see the impact of statelessness yourself by adding an instance variable to the Server class, modifying it and printing it out within the RPC, then making mulitple requests to the server.

For more details and an example of the new object creation issue discussed above, see this XML-RPC reference page.

Concurrency and Synchronization

Your servers will need to be careful about protecting data that may be accessed by multiple requests concurrenctly. In Java, a standard way to perform synchronization is via the synchronized keyword, which allows you to (among other things) mark specific methods as only safe for execution by one thread at a time. Doing so ensures that the method will not be executed concurrently, which is good for safety but potentially bad for performance. Thus, the essential challenge of synchronization is to provide safety when necessary but still allow for enough parallelism to achieve good performance.

You may find Oracle's tutorial on synchronization in Java useful if you have not done much concurrent programming previously.

Testing and Evaluation

For initial testing, you can run all components of the system on the same physical machine. For later testing, you will be provided a set of 4 separate machines for running each component of the system separately - details on these machines will be provided later. You will need to run a set of performance experiments and provide results in your writeup (described below).

Writeup

Once you have implemented your system, you should write a short paper (2-4 pages) that describes your system. The general framework of this paper should be similar to the outline described in Project 1. As part of your evaluation, you should include results of the following experiments:

  1. Compute the average response time per client search request by measuring the response time seen by a client for 500 sequential requests.
  2. Now compute the average response time per buy request for 500 sequential requests.
  3. Rerun experiments 1) and 2) with multiple clients concurrently making requests to the system. Does your average response time change as the number of concurrent requests changes?

The exact setup of how you run your experiments is up to you, but your experimental design should be sufficient for answering these questions, and your setup should be described in your writeup in enough detail that someone else would be able to replicate your experiments. Include graphs where appropriate to demonstrate your results.

Logistics and Evaluation

As with Project 1, accept the project Git repository using the link on Blackboard (at which point you will join your group as well). You can then clone this repository to turing and begin to work.

Your program will be graded on (1) correctly implementing the server specification, (2) the design and style of your program, and (3) the quality of your writeup. For guidance on what constitutes good design and style, see the Coding Design & Style Guide, which lists many common things to look for. Please ask if you have any other questions about design or style issues.