Archive for the ‘Cassandra’ Category

Today I’m gonna share with you my experience when I started with Apache Cassandra…One of the most complicated steps to learn any NoSql stuff, is to take away of your mind the normalization principles and those relational DB structures. Relational databases are designed to persist normalized data and without duplicated data. Well, one of the main changes here is that you need to think or design for your queries, in what your reports or finder methods want, and build a the persistent structure as it need.

Cents of web pages, books, papers treat about What Cassandra is, What Hazelcast is, What Hadoop, MemcacheDB, MongoDB, etc….But none of them treat about HOW TO migrate my data from a relational DB to one of them.

We wanted to migrate the persistent data of two our modules, Turmeric SOA Monitoring and Turmeric SOA Rate Limiting data. In Turmeric we use MySql as relational database. After a week reading and analyzing several NoSql options we decided for Cassandra. <— I hope to write another post about the whys…. btw, I highly recommended this reading: Cassandra: The Definitive Guide

From Relational tables to Keyspaces

The big deal now is How to migrate them. Well this is what we did:
Following an Agile best practice, if something is to hard or complex, just, break it in small challenges. After all we still had a good gap for a MMF (“Minimal Marketable Feature”, refer to Software by Numbers. So:

Step 1: Move our Relational DB tables to Cassandra Column Families.
Step 2: Customize our new Column Families in your Keyspace in order to have all needed data without JOIN operators
Step 3: Explode those Column Families as finder and query method needs. Typically a finder or query method should use 1 Column Family
Step 4: Customize Creators and Updater methods according previous changes. Don’t be scared if you are saving duplicated data. Keep in mind, “think for your queries!, forget to normalization rules.”
Step 5: while (!pleased) -> do step 3 and 4

A Cassandra DAO

Now, the hardest step is #1. Don’t panic, we developed a kind of generic (in fact it uses Java Generics) Cassandra DAO for your migration. As all this work was needed for the project I’m actually working on, you will find it as a submodule of TurmericSOA, but following the Apache License you can use it through your Maven dependency file.


<dependency>
<groupId>org.ebayopensource.turmeric.utils</groupId>
<artifactId>turmeric-utils-cassandra</artifactId>
<version>1.2.0.0-SNAPSHOT</version>
<type>jar</type>
</dependency>

Features

  • 100% Java code
  • It can runs an Embedded Cassandra Service or just talk to your external Cassandra Service
  • Uses Hector library as Java Cassandra client
  • Dynamically [Super] Column Family creation
  • Key Types and Data Types defined at runtime with the use of Generics
  • Main CRUD methods supported:
boolean containsKey(KeyType key);

void delete(KeyType key);

T find(KeyType key);

Map> findItems(final List keys, final Long rangeFrom, final Long rangeTo);

Set findItems(final List keys, final String rangeFrom, final String rangeTo);

Set getKeys();

void save(KeyType key, T model);

Main Classes
This util package contains the following package and classes:

org.ebayopensource.turmeric.utils.cassandra.service

  • CassandraManager: initialize a static EmbeddedCassandraService instance based on yaml configuration file

org.ebayopensource.turmeric.utils.cassandra.hector

  • HectorManager: Manages the keyspace and column family creation and reading. It uses Hector Api
  • HectorHelper: Includes some utility methods based on Java Reflection and Java Generics. IE: retrieving the field names from a POJO which are used as column names in cassandra keyspaces

org.ebayopensource.turmeric.utils.cassandra.dao

  • AbstractColumnFamilyDao: As it is called, this should be a base class that every dao should extends. It defines and implements basic DAO operation with the use of Hector Api.

Configuration files

Here is the directory structure of the configuration files:

META-INF/
         security/
                  config/
                         cassandra/
                                   cassandra.properties

An example of this property file:

cassandra-cluster-name=TurmericCluster
cassandra-host-ip=127.0.0.1
cassandra-rpc-port=9160
cassandra-my-keyspace=My-keyspace

#column families
cassandra-foo-column-family=foo
cassandra-bar-column-family=bar

How to use it….
It is very intuitive. Lets suppose we have a Foo table in our relational DB, ie MySql.
So:

Create the BaseDao interface

public interface BaseDao {
		  public void delete(String key);
		  public Set getKeys();
		  public boolean  containsKey(String key);
		  public void save(String key, FooPojoClass  fooPojo);
		  public FooPojoClass find(String key);
}

Create the FooDao interface

public interface FooDao extends BaseDao  {
}

Create the FooDao implementation


public class FooDaoImpl extends AbstractColumnFamilyDao
		implements FooDao {
	public FooDaoImpl(final String clusterName, final String host, final String keySpace, final String cf,  final Class kTypeClass) {
		super(clusterName, host, keySpace, kTypeClass, FooPojo.class, cf);
	}

}

… in your code

//initiates an embedded Cassandra Service
CassandraManager.initialize();

//creates our Foo Column Family
FooDao fooDao = new FooDaoImpl("myCluster", "127.0.0.1", "myKeyspace",
				"myColumnFamilyName", String.class);

and voilà, you have your relational table migrated as a Cassandra column family!!!

Anyways your can surf at UT classes to see how are they implemented…

enjoy it!!!

Advertisements

Hi dude..

In this post I’m going to show you a brief demo about the use of Rate Limiter feature with Turmeric SOA and Cassandra Integration.

Firstly let me explain you the scenario and the environment setup.

Scenario:

-A resource, a Service in this case,  must be denied after a member of a Subject Group makes 4 calls to an Echo Service

Environment:

  • A simple Echo Web Service deployed in two customised Jetty server, these are Jetty-Turmeric 1 and 2 (see chart Environment Setup)
  • Then Rate Limiter Service and Policy Service and PolicyAdmiUI deployed in other two Jetty servers
  • 4 Cassandra nodes running as stand alone services. In my case I used cassandra v 0.8.2
  • a turmericdb schema in a MySql database to store the predefined policies
  • any web browser, as REST client, FF in my case

Jetty-Turmeric Web Server Setup:

Download and decompress jetty-turmeric. This customised Jetty already provides a Echo Service as example. We need 4 instances (copies) of it, so let’s call it jetty-turmeric-1.0.1.0-SNAPSHOT-1, jetty-turmeric-1.0.1.0-SNAPSHOT-2, jetty-turmeric-1.0.1.0-SNAPSHOT-3 and jetty-turmeric-1.0.1.0-SNAPSHOT-4. Don’t forget to change their listening ports, otherwise you will get a nightmare of conflicts. I choose 8080, 8081, 8082 and 8083. Also you need to change SSL ports, and debugging port in case you want to do a remote debuggin with Eclipse:  -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8000

Then, lets prepare the Jetty #1 and #2, those who has the Echo Service Consumer. One of the yellow boxes we need to setup in the SIF is the AuthenticationHandler, that is a handler which makes call to the Authentication Service deployed in Jetty Server #3 and 4 respectively.  So we need to edit the {JETTY-HOME}/resources/META-INF/security/config/AuthenticationPolicy.xml file and add the operation against we need the Authentication process be run  :

<resource name="ExampleEchoServiceV1" default-authentication-method="basic" type="SERVICE">
<operation name="echo">
<authentication-method>basic</authentication-method>
</operation>
</resource>

Now, since we use our Turmeric-Cassandra-util package we need to add some jar dependencies under {JETTY-HOME}/lib/turmericsoa-security/ folder:
log4j-1.2.9.jar
TurmericUtils-0.9.0-Beta.jar

As some services are deployed under another Web Server, we must indicate the actual endpoint and don’t forget to change the Transport Protocol, from LOCAL to HTTP. This applies for {JETTY-HOME}/lib/turmericsoa-security/AuthenticationServiceConsumer-1.x.x.jar/META-INF/soa/client/config/AuthenticationService/ClientConfig.xml

<client-config-list xmlns="http://www.ebayopensource.org/turmeric/common/config"></pre>
<client-config service-name="{http://www.ebayopensource.org/turmeric/services}AuthenticationService">
<service-interface-class-name>org.ebayopensource.turmeric.services.authenticationservice.intf.AuthenticationService</service-interface-class-name>
<service-location>http://localhost:8082/security/AuthenticationServiceV1</service-location>
<client-instance-config>
<invocation-options>
<preferred-transport name="HTTP11">
<override-options>
<skip-serialization>true</skip-serialization>
</override-options>
</preferred-transport>
<request-data-binding>XML</request-data-binding>
<response-data-binding>XML</response-data-binding>
</invocation-options>
</client-instance-config>
</client-config>
</client-config-list>
<pre>

localhost:8083 for Jetty#2 setup

Also applies for {JETTY-HOME}/lib/turmericsoa-security/RateLimiterServiceConsumer-1.x.x.jar/META-INF/soa/client/config/RateLimiterService/ClientConfig.xml

Endpoint: http://localhost:8082/security/RateLimiterServiceV1 or http://localhost:8083/security/RateLimiterServiceV1 for Jetty#4

Now, lets prepare the Jetty #3 and #4.

As we can see in the above chart in Jetty#3 resides, Security Service, Policy Service and PolicyAdminUI (the Web UI for
policy Administration).
Download and extract them under webApps folder. Then the custom updates are:
For Security Service: None
For Policy Service: None
For PolicyAdminUI: in {Jetty3-HOME}/webapps/policy/lib/web.xml update the endpoint ports as Jetty3 has, that is 8082
for http and 8445 for shttp.

Later, in both {JETTY3-HOME} and {JETTY4-HOME} /resources/META-INF/config/cassandra.properties you need to add Rate
Limiter Keyspace info and also indicate there that cassandra will run in standalone mode instead of embedded mode,
default mode.

################################
###### RATE LIMITER COUNTER ####
################################
cassandra-rl-cluster-name=TestCluster
cassandra-host-ip=127.0.0.1
cassandra-rpc-port=9160
cassandra-rl-keyspace=rl

#column families
cassandra-active-rl-column-family=activeRL
cassandra-active-effect-column-family=activeEffect

embedded=false

This config should follow the cassandra configuration. Other cassandra files are not necessary to be changed due to we set embedded=false. This, tells Turmeric that Cassandra service is running in a standalone mode; setting to true TurmericSOA will start its own embedded Cassandra service.

If you need more datails on how to enable Rate Limiting to your own service surf on Turmeric Rate Limiting Setup page

enjoy it!!

New coming features…

Posted: September 14, 2011 in Cassandra, Turmeric

Here are some of the new features delivered within the next rlse.

      • Expanded Rest Support: SOA Framework will now support REST operations that come in with HTTP verbs PUT and DELETE in addition to GET and POST. Operation mapping in service config xml can be done for each http request type (GET, POST, PUT and DELETE).
      • OSGI – Now SOA consumers can use SOA in an OSGI safe way : Currently, SOA runtime loads resources and handlers from classloader. When running in OSGi environment, resources such as ClientConfig.xml, ServiceConfig.xml, and application customized handlers don’t belong the SOA runtime bundle. Now SOA OSGI fwk provides registration API which supplies directly streamed config resource information so that getResourceAsStream() does not need to be called.
      • Zero Config Consumer: This feature eliminates the need for separate consumer jar. Now Plugin has Simple and Advanced mode for Consumer creation from WSDL. In simple mode, consumer project would not be created. Runtime will automatically switch to use the default Client Config file.
      • Protocol Buffers Support: SOA fwk now supports the Google’s data format – Protocol Buffers along with existing data formats XML, FAST_INFOSET, JSON and NV. SOA tooling generates the required artifacts to handle protobuf if the service is enabled for protobuf. The developer no need to write any special code to use the protobuf format except the configuration in Client Config file just like any other format.
      • Distributed Rate Limiter counters:   With the use of Cassandra ring the rate limiter feature can be deployed in cluster mode
      • Aggregation data for Monitoring:   the powerful monitoring console will now show summarized data. It can also read data from distributed nodes in Cassandra.