Monday, January 19, 2009

Grid Computing - JPPF

Introduction
There are number of Grid computing and parallel processing frameworks readily available in the market like JPPF, GridGrain, GigaSpaces, Terracota etc.
More or less all the framework seems to be same and fits into the wide variety of software requirements.
I find JPPF to be simpler, easier and quick to implement than any other Grid Computing Framework. Going forward in this article I will be explaining the various features provided by JPPF and how it meets the basic and advanced needs of an individual.
.....

Definition of JPPF
“In common language JPPF is a framework which is built on the Grid Computing Technology and is scalable to support from 10 to 10000n nodes.”

The above definition seems to be good enough to start the discussion but going forward I will be refining the definition of JPPF, so that we can take out a holistic view of JPPF.

Basic Features of JPPF
Execution Algorithm

JPPF is packaged with an efficient algorithm, which can be customized or can be overridden in case of extreme Circumstances.
1. JPPF defines he concept of Bundles1. These bundles can be customized by the user and the numbers can be defined as per the user needs.
2. The execution Policies helps the user to customize the way the tasks are being executed by the nodes.


e.g.
Problem Statement
To execute the memory intensive tasks a different set of nodes have been allocated.
Now the problem is how a user can direct that the server a specific type of tasks should be executed only by some specific nodes and not by all the available nodes?

Solution
JPPF Execution Policies make it possible.
The following policy defines that a particular set of tasks should be executed only by the node which has “maxMemory” more than 2 GB and “freeMemory more than 1 GB”.



< ?xml version="1.0" encoding="UTF-8"?>
< ExecutionPolicy>
< AND>
< MoreThan>
< property>freeMemory< /property>
< !--Free memory should be More than 1 GB -->
< value>1000000 < /value>
< /MoreThan>
< MoreThan>
< property>totalMemory < /property>
< !--Total memory should be More than 2 GB -->
< value>2000000 < /value>
< /MoreThan>
< /AND>
< /ExecutionPolicy>

Piece of Code which configures and associates this Policy with the tasks: -
JPPFClient client = new JPPFClient();
ExecutionPolicy policy = PolicyParser.parsePolicy(new File("../../ExecutionPolicy.xml")); client.submitNonBlocking(taskList, null, null, policy);

Data Providers
Data Providers is a Concept of Sharing of Common Data between the tasks. Any memory Intensive data and common data can be shared between the tasks. Though the care should be taken that this data should be read only and should not be manipulated by any of the tasks. JPPF Defines 3 types of DataProviders: -
1. URLDataProvider: - Used to read data from the provided URL 2. MemoryMapDataProvider: - Sets the data in the HashMap 3. CompositeDataProvider: - Capable of Holding multiple objects of MemoryMapDataProvider Code Snippet using MemoryMapDataProvider

1. JPPFClient client = new JPPFClient(); //creates a JPPFClient Object
2. ExecutionPolicy policy = PolicyParser.parsePolicy(new File("../../ExecutionPolicy.xml")); //creates a Policy Object
3. List listOfObjects = new ArrayList();//List of user defined objects.

4. MemoryMapDataProvider provider = new MemoryMapDataProvider(); //creating an object of provider
5. providers.setValue("data", listOfObjects);// setting the list in the provider
6. client.submitNonBlocking(taskList, providers, null, policy);// submitting the tasks to the JPPFServer.

Monitoring

All the nodes/ Servers exposes there performance and configuration parameters through JMX API. Any JMX Compliant tool can give the statistics. A tool is already packaged with the JPPF, which not only dumps all the statistics about the Nodes and the Server, but also gives various other options: -

• Graphs – can be created at runtime based on the selected parameters
• Can change parameters at runtime.






Advance Features of JPPF

Grid Computing
JPPF is designed on the principles of the Grid Computing. 4 Nodes talking to a JPPF Server can be configured in 4 different clusters and each server/ Node in a cluster is talking to each other. Visit JPPF Wikki to explore more on this.

Load Balancing
JPPF defines various Load balancing algorithms which can be configured for each and every server serving in a Network Architecture. The various pre-defined algorithms defined by JPPF are: -
1. Static Algorithm – It’s more of a manual Configuration in which the size of bundle is fixed and is defined at the time of initialization of the driver.
Server Configurations to be set in jppf-driver.properties

task.bundle.size = 5
task.bundle.strategy=manual

2. Heuristic Algorithm – Popularly known as the repeated random sampling method, which defines the Size of a Bundle (to be send to a node) by the performance of a specific node. The best part is that the size of bundle keep on changing based on the current performance of a Node. As the Heuristic Algorithms itself suggest that the solution provide by this way cannot be a optimal solution, but it would be close to good, so that same would be applied to the drivers running on this configurations.
Server Configurations to be set in jppf-driver.properties

task.bundle.strategy=autotuned
task.bundle.autotuned.strategy = aggressive
strategy.agressive.minSamplesToAnalyse = 100
strategy.agressive.minSamplesToCheckConvergence = 50
strategy.agressive.maxDeviation = 0.2
strategy.agressive.maxGuessToStable = 50
strategy.agressive.sizeRatioDeviation = 1.5
strategy.agressive.decreaseRatio = 0.2

3. Deterministic Algorithm – Based on the Principles of deterministic Algorithm (http://en.wikipedia.org/wiki/Deterministic_algorithm) this strategy doesn’t allow overriding the bundle size strategy at the Node Level.
Server Configurations to be set in jppf-driver.properties

task.bundle.strategy= proportional

task.bundle.proportional.strategy = optimized
strategy.optimized.performanceCacheSize = 2000
strategy.optimized.propertionalityFactor = 2

Fault Tolerance and Self Repair
By default Nodes and servers are set up as the fault tolerant Systems. Each node is capable of detecting the availability of the server and tries to reconnect the Server at various Intervals. Same is true for the Servers running in a clustered environment.
Following properties need to be defined in the node configuration: -

#In seconds. default is 1
reconnect.initial.delay=5
#amount of time in seconds. default is 60

reconnect.max.time=20
#Intervals in seconds which will be used between 2 consecutive attempts. default is 1
reconnect.interval=2


Why JPPF
At this point when we all are pretty much aware of the basic and advance features of JPPF, I would like to mention few more good points about JPPF: -
1. Easy to set up. I was able to run the sample application on JPPF within 1 hour.
2. Detailed and simple configurations, which can be used without modifying as the numbers are defined considering the needs of the various application requirements.
3. Unlike other Open Source framework it provides enough documentation for a developer to understand the API and override if needed.
4. Task can be defined from the existing code by using Annotations2.
5. Implements JCA1.5 specifications to support various application servers like Jboss, Sun, Oracle OC4J, websphere, weblogic.
6. Use of TCP Multiplexers - JPPF works very well in a secured network where limited ports are open.
7. Serialization process can be customized and developers can specify their own classes for Serialization of objects.
8. Can be used to created windows screensaver installer.


Sample Application
Introduction

This sample application does the following activities: -
1. Creates an ongoing process, which submits the 5 tasks after every 1 Second.
2. Each tasks is very simple in nature which just prints “task Executed” on the console of node which is executing this tasks.
3. Sets the output as “Hello World” to the TaskListener, which is further returned back to the client and client prints that on its console.
Note: As it is an ongoing process so to end this application you have to forcefully end it (CTRL+c)

Setting up Sample Application
1) Download full version of JPPF1.5 from <> and unzip it anywhere on your Local Box or you can directly download it form sourceforge also
2) Go to /JPPF/bin and type ant runtime
3) Go to /JPPF/build dir and you will 4 zip files
a. jppf-driver-bin-1.5-0545-20081025.zip - This is the JPPF Driver
b. jppf-node-bin-1.5-0545-20081025.zip - This is the Node
c. jppf-gui-bin-1.5-0545-20081025.zip - This is the Admin GUI
4) Unzip jppf-driver-bin-1.5-0545-20081025.zip on your local box and give it the directory name as “JPPFDriver”
5) Unzip jppf-node-bin-1.5-0545-20081025.zip on your local box and give it the directory name as “JPPFNode”
6) Unzip jppf-gui-bin-1.5-0545-20081025.zip on your local box and give it the directory name as “JPPFGui”
7) Open Console, go to JPPFDriver directory and type “ant run.driver” and your driver is up and running
8) Open Console, go to JPPFNode directory and type “ant run.node” and your Node is up and running connected to the driver, which we started in the earlier step
9) To run the JPPF Admin execute “ant run.gui” command within the “JPPFGui” directory from the console.
10) Download the client application and extract on your local box and give name as “examples”.

11) Open build.properties file and define 2 properties as per your system configurations
a. JPPF_HOME=
b. jppf.config= .
12) Open console and type ant run.example (this example will submit 4 tasks in every 1 second).
13) Now look at th e various tabs provided in GUI (refer to step 9), you will be able to see various statistics published by the node and the Server.

Conclusion
After looking at all the features I would say that it is the right time to modify the definition of JPPF

Java Parallel Processing Framework – Based on the JDK5 Executor Service, a highly Customizable and Scalable Open Source Grid Computing framework with a very efficient parallel processing/ load balancing algorithm and packaged with a Handy tool for monitoring the System performance.


References
http://www.jppf.org/wiki/index.php?title=Architecture#Extended_Grid_Topology
http://www.jppf.org/wiki/index.php?title=JPPF_Tasks_and_Execution_Policy


Appendix
1. Bundles can be assumed as a specific number of tasks packaged in a Box, so that they can be submitted to a Node to execute. The Number of tasks packaged in a Bundle can be increased or decreased as per the configuration defined by the user.
2. Available in JPPF1.5

No comments: