Distributed Systems Java Socket Based P2P System

11 minute read

Description

The system I designed is a Peer-to-Peer (P2P) system that establishes the file sharing with a Distributed Hash Table (DHT). There is a main server in this system that maintains a Universal Hashed Peer Table (UHPT) and a Universal Hashed Resource Table (UHRT) for the file sharing process in this system. Each piece of data in the UHPT of the main server stores the GUID of a peer as well as its IP address, port number and the Routing Metric. UHPT is a hash map in this system and the key is each peer’s GUID. Each piece of data in the UHRT of the main server stores the GUID of a resource as well as its owners’ GUID. UHRT is a hash map in this system and the key is each resource’s GUID. Each peer in this system has two characters: client for requesting resources and server for giving resources. To realize that, each peer maintains a Distributed Hashed Resource Table and a table that stores the resources it owns. Each piece of data in the DHRT stores the GUID of a resource as well as the contact information of its owner peer and the file name of the resource. The data transferring in this system is implemented by Socket based on TCP. Detailed information about this system is shown below.

Peer Server Opening

As introduced in the beginning, each peer is a server for listening requests and giving resources. This section will describe the process of opening the server port of a peer.

When a peer is to open its server, it first informs the main server. To do that, the peer create a new socket connection to the main server. The information transmitted includes the statement that the information is to inform opening, the IP address and port number of the peer as well as all GUID of the peer’s resources. The GUID of the resources are generated according to the content of the resource with the SHA-1 algorithm so that for resources can be recognized according to their contents.

After the main server receives the information, it first generate a GUID for this peer based on UUID algorithm so that all peers’ GUID are unique. After that, it stores in its UHPT the GUID of the peer as well as its IP address, port number and Routing Metric which is generated according to the number of resources the peer owns. For each resource in the peer, if the resource is already in UHRT, the GUID of the peer will be added to the corresponding owner list. Otherwise, the GUID of the resource as well as the GUID of the peer will be stored in UHRT. When all of these are done, a message will be returned to the peer to inform the GUID generated for it and the peer will remember the GUID. Then the main server close will close this transmission socket. After that, the peer closes this transmission socket and opens its server socket to keep listening to resource requests. Figures below shows the messages output by a peer and main server when the peer opens server.

Get Resource

In this system, we assume that all the peers knows the the existing resources in this P2P system. When a peer wants to get a resource, it first check if its DHRT contains the resource. If the resource is not recorded in DHRT, the peer will create a socket connection with the main server and send a message to the main server to request the contact information of the owner peer. After receiving the message, the main server checks its UHRT to find the owner list of the resource and compare the owners’ Routing Metric to find the best matched owner peer. The IP address and port number of that owner peer will be returned to the requesting peer for getting the resource and then the socket connection between the peer and the main server will be closed. If the DHRT contains that resource, the peer will create a new socket connection and send the request directly to the owner according to the corresponding contact information. If the peer that receives the request contains that resource, it will send it, but if the peer has removed the resource, it will send back the message that it doesn’t have that resource. Then the requesting peer will delete the piece of data of the resource in DHPT and send a getting request of the request to the main server. After the operations, the socket will be closed. The main server must knows the resource because we have assumed that the peer knows the existing resources so it won’t request a resource that doesn’t exist.

In the process of transmitting, the peer to receive a resource send the GUID of the resource to the owner peer. The owner peer will first send the file name of the resource and the receiver will check if it owns a file that has the same name with the file name (even if the name is the same, the content won’t be the same because the GUID of each resource is generated by its content and a peer knows all its resources’ GUID so it won’t request a resource whose GUID is the same as a resource that the peer owns). If there is a file with the same name, then the file name of the requested resource will be changed to its GUID. Then, the owner peer sends the length of the resource and start to transfer the file stream. When the transferring is completed, the receiver peer will check if the byte received is equal to the length of resource. If it is, it means that the transferring is successful and it 1.stores the resource as well as the contact information and the file name of the resource in its DHRT. 2.stores the resource as well as its file name in its table that records the resources it owns. 3.informs the main server that it owns a new resource by stating the type of the message and adding its own GUID and the resource’s GUID in the message. The main server will then update its UHRT according to that message. If the byte received doesn’t equal to the length of the resource, then the receiver peer knows that there are errors in the process of transferring and it will delete the data it received. The picture below shows the case where a peer request a resource according to the contact information recorded in its DHRT but the correponding owner peer has removed that resource.

Image 1

Add File

In this system, peers are able to add new files to their dictionaries used for sharing as their resources. To do that, the peer first chooses a file to add and check if the the file content is the same as any file it owns. If it is, the file can’t be added. Otherwise, the file will be copied into the dictionary and a informing message will be sent to the main server by creating a new socket connection if the file is copied successfully. The message first states the type of it and adds the GUID of the peer and the GUID of the resource added. After receiving the message, the socket will be closed. Then the main server will update its UHRT according to that message.

Remove File

In this system, peers are able to remove files in their dictionaries. To do that, the peer first chooses a file to remove from its dictionary. If the file is deleted successfully, the peer will inform the main server of the change by creating a new socket connection. The message first states the type of it and adds the GUID of the peer and the GUID of the resource removed. After receiving the message, the socket will be closed. Then the main server will update its UHRT according to that message.

Peer Server Closing

This section will describe the process of closing a peer’s server. When a peer closes its server, it will send a message to the main server to inform this process. The message first states its type and then adds the peer’s GUID. When the main server receives the message, it will delete that peer in its UHPT. The main server will also delete the resource in the owner lists in UHRT if the resource owned has more than one owner and delete the record of the resource the peer closed is its only owner.

Performance Statistics

Opening Performance

As mentioned above, the opening process includes generating GUID for all resources in peers according to their contents. Therefore, the performance of opening highly depends on the total size of resources in each peer as most time will be used to input the content of the resource and generate GUID for it using SHA-1 algorithm. The table below shows the total sizes of resources as well as their corresponding time their peers used to open.

Image 2

Transferring Performance

The performance of resource transferring usually depends on the size of the resource transmitted. The table below shows the sizes of resources as well as their corresponding time used for transferring them. All the data is tested for no less than 3 times.

Image 3

Advantages and Weakness

Advantages

Stability

The server is stable as it considers several conditions: the owner peer recorded has removed the file or has closed its server, the file to be received has the same name with a file which is already in the dictionary… Furthermore, before transferring a resource, the length of the resource will be sent and when transferring a resource, the length of data that has been received will also been recorded. By doing this, the receiver peer are able to know if there are any error in the process of transferring.

Scalability

In this system, each peer can be both a server and a client and maintains its own DHRT. Therefore, the resource it itself won’t be given by the main server. Furthermore, once a peer has requested a owner peer’s contact information to the main server, the information will be stored in the peer’s own DHRT and the main server won’t be requested again except in some special case that is mentioned above. Therefore, there can be quite a lot peers in this system and the performance of resource sharing won’t be influenced too much.

More Operations

Besides transferring resources, peers in this system are free to add new files, remove files and close their servers. After doing these operations, the informing message will be sent to the main server and the main server will update its tables. No problem will be caused by these operations. Furthermore, Even if a peer close its server, it won’t influence its other operations which means that the peer can its operations except listening to requests and transferring resource.

Weakness

Opening Speed

The GUID of resources in this system is generated by SHA-1 according to their contents. Therefore, for a large file, it may require much time to generate its GUID and that may slow down the opening speed of peers as they have to generate their resources’ GUID and send them to the main server.

Transferring Quality

The resource transferring in this system is based on basic socket class and data sending and receiving is implemented be DataInputStream and DataOutputStream. Therefore, the quality and speed of data transferring in this system may not be as good as systems that are based on RPC or RMI.

File Name Generation

In this system, if a peer finds that the file to be added to its dictionary has the same name with a file that is already in its dictionary, it will change the name of the file to be added to the file’s GUID. However, the GUID is meaningless for human so that it is difficult for humans to recognize the file by viewing its file name.

Round Metric

The round metric in this system is generated according to numbers of peers’ owned resources. However, peers in this system are able to add and remove resources and the round metric is fixed. Then the aim to generate the round metric may not be reached. (The aim will be discussed in the next section)

Rationales for Design Decisions

GUID Generation

As it is mentioned above, the GUID of each peer is generated by the algorithm UUID which can generate a string that is unique in the world. The reason is that for two peers, they may have the same name and if I generate its GUID according to their names using the algorithm such as SHA-1 or MD5, then their GUID will be the same as long as they have the same name.

The GUID of resources in this system is generated by SHA-1 according to the resources’ contents. The reason why I made this decision is that I think the peer request a file for its content, not its other information such as file name. Therefore, I made GUID of resources based on its contents so that they can only be recognized according to their contents.

File Name Change

As it is mentioned several times, if a peer finds that the file to be added to its dictionary has the same name with a file that is already in its dictionary, it will change the name of the file to be added to the file’s GUID. The reason why I made this decision is that we assume that a peer won’t request a resource that has the same content as a resource it owns. Therefore, once a file’s name is changed to its GUID, there won’t be another file added that has the same name as its name.

Round Metric Generation

In this system, the round metric of each peer is generated according to the the number of resources of each peer when it informs the main server that its server is opened. The reason why I design in this way is that I think if a peer has a large number of resources, it may be always busy transferring resources. Therefore, if a peer ask the main server for the own peer of a resource, I would like to let the main server transfer the owner peer with less resources.