Thursday, November 12, 2009

FreeRADIUS with MySQL cluster

About:
This is all about deploying FreeRADIUS with MySQL cluster, understand about FreeRADIUS deployment options with MySQL cluster for high availability ease of database management.

Overview:
Few months ago, I was working on FreeRADIUS integration with MySQL cluster to provide a better alternative for high availability and scalability of the data store. The same is available as white paper for some time now, and the white papers are "The deployment of FreeRADIUS with MySQL cluster" and on "Strategy Guide for Building Highly Scalable & Available AAA Services".

Look here for how to configure MySQL cluster, or configure FreeRADIUS with mysql cluster database. Since the integration of FreeRADIUS with MySQL cluster and testing of the same is also carried out, this document highlights some of those areas too.

As well I ported the FreeRADIUS on Solaris 10, as I used Solaris Containers to simulate multiple systems as required. However I used 3 systems running Solaris 10 on it. The same could be used with Linux too, however we have to carefully manage co-location of MySQL clusters SQL node, Management node, and FreeRADIUS application itself.

Since mysql cluster needs minimum of 3 nodes for providing high availability of the data being stored. The stored data on both the nodes is replicated synchronously across the nodes, a third node is used as management node which also offers quorum vote in the event of node failure and also does the management of mysql cluster. This third node could be used to host the application as well as MySQL server. It becomes more easy if we have multiple systems, or multiple OS environments (high level abstraction providing virtual independent system) / solaris containers will ease the job.

When FreeRADIUS receives any request from it's clients either for authentication, authorization or accounting it looks up at it's configured data store option and sends the appropriate request to fetch the data from the data store. Based on the response from the data store FreeRADIUS takes the appropriate step. However since FreeRADIUS works with MySQL, it does lookup on the indexed columns which is very good for mysql cluster, this does not result in the any changes to FreeRADIUS data store mysql module, however there are few changes that are documented in the white paper for configuring the same. Once the request is landed on the mysql cluster either of data node would serve the request via SQL node (MySQL server).

Note: I ported the FreeRADIUS on Solaris, will share the learning of the same in my subsequent post.

Acknowledgment:
I would like to thank Alan Dekok for his help to walk me through the high level architecture and working of the modules and options available with FreeRADIUS.

Friday, October 30, 2009

wansimulator

About:
Are you thinking to setup a "wan-simulator" for testing application functionality and performance over-head of the wan(wide area network) within the lab environment, than this might be of interest to you .

Overview:

Any x86 (64/32bit) system can be used to host this. This system could run latest OpenSolaris, to download OpenSolaris visit http://genunix.org/ . Wan-simulator leverages on to project crossbow, it's a streams module/driver which emulates WAN environment. It captures packets from IP and manipulates the packets according to the WAN setup.

There are five parameters to control the environment.
  1. network propagation delay,
  2. bandwidth,
  3. drop rate,
  4. re-ordering, and
  5. corruption.
Hxbt acts like a "pipe" between two hosts. And the pipe is the emulated WAN.

Note that the current hxbt implementation only operates on the output side. Hxbt can be visualized as a emulated router on the outgoing path. And the returning path is a "direct" link. In normal circumstances, this asymmetry should not be a problem as long as the parameters are set properly to reflect WAN condition on BOTH paths. For example, if outgoing and incoming paths both introduce a 10ms delay, set hxbt to emulate a 20ms delay. If incoming and outgoing paths need to be emulated, set up hxbt in machines on both sides of the path.
For more details about the wan-simulator please visit OpenSolaris networking.

This case study is a typical example where one need to have simulated wan or have systems geographically apart to test the replication over long distance for disaster recovery testing. There could be other case studies like if we have to understand the impact of the AJAX/Javascripts running on the web-client or hand-held devices over long distance, we could setup such an environment to do the required testing within the comfort of one location, this eases out some of the challenges one would face to do such activity, like co-ordination over the long distance to ensure the required environment is sane as per the requirement etc.

In this example of disaster recovery scenario, where 2 sites are geographically apart around 1500 miles,bandwidth is 8Mbps, link is bad, simulating 5% packet drops and 5% corruption of data. There are multiple IP addresses on each side talking to each other over this simulated wan to do the data replication and monitor the health of each site and management of Inter Cluster Resource Management(ICRM) etc. Since these 2 sites are geographically apart, for these 2 sites on 2 different TCP/IP network this wan-simulator acts as router among both the sites, as well as provides the required functionality of wide area network by using hit-box/hxbt (wan-simulator). The above diagram attempts to show this environment.

If need to compute the latency etc, visit netqos or compute the required latency and understand packet drops and understand the data corruptions by manually observing it using ping and other sniffing tools.

Installation of wansimulator:


All you have to do is download OpenSolaris from http://www.genunix.org/. I had tried this on OpenSolaris b122, however you could download and try it out on the latest build too.

1. Download the pre-compiled binaries tar ball from this blog, or download the latest source code from http://opensolaris.org/os/community/networking/. Here I have used tar ball from this blog.

2. Unpack the tar ball, copy the pre-compiled binaries to the appropriate locations,
cd /extracted_directory
cp onnv/usr/src/uts/intel/hxbt/obj32/hxbt /kernel/drv
cp onnv/usr/src/uts/intel/hxbt/obj64/hxbt /kernel/drv/amd64
cp onnv/usr/src/uts/common/inet/hxbt/hxbt.conf /kernel/drv
cp onnv/usr/src/cmd/cmd-inet/usr.sbin/ifhit/ifhit /usr/sbin
add_drv -m"* 0666 root root" hxbt
How to verify it is installed?

Insert the hxbt module to verify the successful installation and configuration.
ifconfig nge0 modlist
ifconfig nge0 modinsert hxbt@2
ifconfig nge0 modlist

Un-install:

This is you would do, once you are done with wansimulator.

root@wansimulator:~#ifconfig nge0 modlist
0 arp
1 ip
2 hxbt
3 nge
root@wansimulator:~#ifconfig nge0 modremove hxbt@2
root@wansimulator:~#modinfo | grep hxbt
252 fffffffff7dc3000 2460 - 1 hxbt (hxbt stream module v1.1)
252 fffffffff7dc3000 2460 291 1 hxbt (hxbt stream driver v1.1)
root@wansimulator:~#modunload -i 252
root@wansimulator:~#pfexec rem_drv hxbt

Configure wan-simulator:

Pre-requisite
[optional]: Configure this OpenSolais wansimulator system as router. For this case study I did configure here are the steps to configure router.

Here are the network interfaces that this system hosts, which takes part in routing.

root@wansimulator:~# ifconfig nge0
nge0: flags=1100843 mtu 1500 index 2
inet 192.168.50.250 netmask ffffff00 broadcast 192.168.50.255
ether 0:21:28:44:67:a
root@wansimulator:~# ifconfig nge1
nge1: flags=1100943 mtu 1500 index 3
inet 10.10.50.250 netmask ffffff00 broadcast 10.10.50.255
ether 0:21:28:44:67:b
root@wansimulator:~#


---> ifhit without options lists all the targets that are configured to take a hit.

root@wansimulator:~# ifhit
hxbt target list:

root@wansimulator:~#


----> Lets look at the time to reach destination before configuring hit-box(wansimulator).
===> ping from primary network

root@pnode1:~# ping -s pnode1
PING pnode1: 56 data bytes
64 bytes from pnode1 (10.10.50.105): icmp_seq=0. time=0.178 ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=1. time=0.0670 ms
^C
----pnode1 PING Statistics----
2 packets transmitted, 2 packets received, 0% packet loss
round-trip (ms) min/avg/max/stddev = 0.0670/0.122/0.178/0.078
root@pnode1:~# ping -s drnode1
PING drnode1: 56 data bytes
64 bytes from drnode1 (192.168.50.103): icmp_seq=0. time=0.443 ms
64 bytes from drnode1 (192.168.50.103): icmp_seq=1. time=0.389 ms
^C
----drnode1 PING Statistics----
2 packets transmitted, 2 packets received, 0% packet loss
round-trip (ms) min/avg/max/stddev = 0.389/0.416/0.443/0.038
root@pnode1:~#


===> ping from DR network

root@drnode1:~# ping -s drnode1
PING drnode1: 56 data bytes
64 bytes from drnode1 (192.168.50.103): icmp_seq=0. time=0.172 ms
64 bytes from drnode1 (192.168.50.103): icmp_seq=1. time=0.154 ms
^C
----drnode1 PING Statistics----
2 packets transmitted, 2 packets received, 0% packet loss
round-trip (ms) min/avg/max/stddev = 0.154/0.163/0.172/0.013
root@drnode1:~# ping -s pnode1
PING pnode1: 56 data bytes
64 bytes from pnode1 (10.10.50.105): icmp_seq=0. time=0.453 ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=1. time=0.392 ms
^C
----pnode1 PING Statistics----
2 packets transmitted, 2 packets received, 0% packet loss
round-trip (ms) min/avg/max/stddev = 0.392/0.422/0.453/0.043
root@drnode1:~#


---> Following are the IP of the wan-simulator(hit-box) configured:
192.168.50.250 drrouter
10.10.50.250 prouter
root@wansimulator:~# ifconfig nge0
nge0: flags=1100843 mtu 1500 index 2
inet 192.168.50.250 netmask ffffff00 broadcast 192.168.50.255
ether 0:21:28:44:67:a
root@wansimulator:~# ifconfig nge1
nge1: flags=1100943 mtu 1500 index 3
inet 10.10.50.250 netmask ffffff00 broadcast 10.10.50.255
ether 0:21:28:44:67:b
root@wansimulator:~#

---> Following is the set IP addresses of both the networks which will be communicating among each other:

192.168.50.103 drnode1
192.168.50.104 drnode2
192.168.50.111 drcluster
192.168.50.112 dapprep
192.168.50.113 ddbrep
10.10.50.105 pnode1
10.10.50.106 pnode2
10.10.50.111 primarycluster
10.10.50.112 papprep
10.10.50.113 pdbrep

----> Configure all the above IP addresses as follows, to simulate 4Mbps link over a distance of 1000 miles which is would have 250millisecond delay and drops approx 5% packet, corrupt 5% of packets going to such that there are 2bytes in each packet are corrupted.

root@wansimulator:~# ifhit 10.10.50.105 -b 4096 -d 5 -c 5 -C 2 -l 250
(::ffff:10.10.50.105)
Bandwidth 524288 (bytes/s) (4096.000000 Kbps)
Drop rate = 5.00 %
Delay = 250 ms
Corruption rate = 5.00 %, corruption count = 2 bytes/packet
root@wansimulator:~#

====> Test the above by pinging to pnode1 (10.10.50.105) from drnode1.
root@drnode1:~# ping -s pnode1
PING pnode1: 56 data bytes
64 bytes from pnode1 (10.10.50.105): icmp_seq=0. time=247. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=1. time=248. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=2. time=248. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=3. time=248. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=4. time=248. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=6. time=248. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=7. time=248. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=8. time=248. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=9. time=248. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=10. time=248. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=11. time=248. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=12. time=248. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=13. time=248. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=14. time=248. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=15. time=248. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=16. time=248. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=17. time=248. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=18. time=248. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=19. time=248. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=20. time=248. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=21. time=248. ms
^C
----pnode1 PING Statistics----
23 packets transmitted, 21 packets received, 8% packet loss
round-trip (ms) min/avg/max/stddev = 247./247.9/248./0.262

root@drnode1:~# ping -s pnode1
PING pnode1: 56 data bytes
64 bytes from pnode1 (10.10.50.105): icmp_seq=0. time=250. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=1. time=248. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=2. time=248. ms
64 bytes from pnode1 (10.10.50.105): icmp_seq=3. time=248. ms
^C
----pnode1 PING Statistics----
4 packets transmitted, 4 packets received, 0% packet loss
round-trip (ms) min/avg/max/stddev = 248./248./250./1.3
root@drnode1:~#


===> Do the same thing to all the IP addresses.

root@wansimulator:~# ifhit 10.10.50.106 -b 4096 -d 5 -c 5 -C 2 -l 250
(::ffff:10.10.50.106)
Bandwidth 524288 (bytes/s) (4096.000000 Kbps)
Drop rate = 5.00 %
Delay = 250 ms
Corruption rate = 5.00 %, corruption count = 2 bytes/packet
root@wansimulator:~# ifhit 10.10.50.111 -b 4096 -d 5 -c 5 -C 2 -l 250
(::ffff:10.10.50.111)
Bandwidth 524288 (bytes/s) (4096.000000 Kbps)
Drop rate = 5.00 %
Delay = 250 ms
Corruption rate = 5.00 %, corruption count = 2 bytes/packet
root@wansimulator:~# ifhit 10.10.50.112 -b 4096 -d 5 -c 5 -C 2 -l 250
(::ffff:10.10.50.112)
Bandwidth 524288 (bytes/s) (4096.000000 Kbps)
Drop rate = 5.00 %
Delay = 250 ms
Corruption rate = 5.00 %, corruption count = 2 bytes/packet
root@wansimulator:~# ifhit 10.10.50.113 -b 4096 -d 5 -c 5 -C 2 -l 250
(::ffff:10.10.50.113)
Bandwidth 524288 (bytes/s) (4096.000000 Kbps)
Drop rate = 5.00 %
Delay = 250 ms
Corruption rate = 5.00 %, corruption count = 2 bytes/packet
root@wansimulator:~#
root@wansimulator:~# ifhit 192.168.50.103 -b 4096 -d 5 -c 5 -C 2 -l 250
(::ffff:192.168.50.103)
Bandwidth 524288 (bytes/s) (4096.000000 Kbps)
Drop rate = 5.00 %
Delay = 250 ms
Corruption rate = 5.00 %, corruption count = 2 bytes/packet
root@wansimulator:~# ifhit 192.168.50.104 -b 4096 -d 5 -c 5 -C 2 -l 250
(::ffff:192.168.50.104)
Bandwidth 524288 (bytes/s) (4096.000000 Kbps)
Drop rate = 5.00 %
Delay = 250 ms
Corruption rate = 5.00 %, corruption count = 2 bytes/packet
root@wansimulator:~# ifhit 192.168.50.111 -b 4096 -d 5 -c 5 -C 2 -l 250
(::ffff:192.168.50.111)
Bandwidth 524288 (bytes/s) (4096.000000 Kbps)
Drop rate = 5.00 %
Delay = 250 ms
Corruption rate = 5.00 %, corruption count = 2 bytes/packet
root@wansimulator:~# ifhit 192.168.50.112 -b 4096 -d 5 -c 5 -C 2 -l 250
(::ffff:192.168.50.112)
Bandwidth 524288 (bytes/s) (4096.000000 Kbps)
Drop rate = 5.00 %
Delay = 250 ms
Corruption rate = 5.00 %, corruption count = 2 bytes/packet
root@wansimulator:~# ifhit 192.168.50.113 -b 4096 -d 5 -c 5 -C 2 -l 250
(::ffff:192.168.50.113)
Bandwidth 524288 (bytes/s) (4096.000000 Kbps)
Drop rate = 5.00 %
Delay = 250 ms
Corruption rate = 5.00 %, corruption count = 2 bytes/packet
root@wansimulator:~#
root@wansimulator:~# ifhit
hxbt target list:
::ffff:10.10.50.105
::ffff:10.10.50.106
::ffff:192.168.50.103
::ffff:192.168.50.104
::ffff:10.10.50.111
::ffff:10.10.50.112
::ffff:10.10.50.113
::ffff:192.168.50.111
::ffff:192.168.50.112
::ffff:192.168.50.113

root@wansimulator:~#

===> After configuring all the IP address the delay is double

root@pnode1:~# ping -s drnode1
PING drnode1: 56 data bytes
64 bytes from drnode1 (192.168.50.103): icmp_seq=0. time=495. ms
64 bytes from drnode1 (192.168.50.103): icmp_seq=1. time=492. ms
64 bytes from drnode1 (192.168.50.103): icmp_seq=3. time=492. ms
64 bytes from drnode1 (192.168.50.103): icmp_seq=4. time=493. ms
64 bytes from drnode1 (192.168.50.103): icmp_seq=5. time=492. ms
64 bytes from drnode1 (192.168.50.103): icmp_seq=6. time=492. ms
64 bytes from drnode1 (192.168.50.103): icmp_seq=8. time=492. ms
^C
----drnode1 PING Statistics----
9 packets transmitted, 7 packets received, 22% packet loss
round-trip (ms) min/avg/max/stddev = 492./493./495./0.95
root@pnode1:~#
===> ping on the same network no packet drops.
root@pnode1:~# ping -s pnode2
PING pnode2: 56 data bytes
64 bytes from pnode2 (10.10.50.106): icmp_seq=0. time=0.991 ms
64 bytes from pnode2 (10.10.50.106): icmp_seq=1. time=0.143 ms
64 bytes from pnode2 (10.10.50.106): icmp_seq=2. time=0.166 ms
64 bytes from pnode2 (10.10.50.106): ^C
----pnode2 PING Statistics----
4 packets transmitted, 4 packets received, 0% packet loss
round-trip (ms) min/avg/max/stddev = 0.143/0.363/0.991/0.42
root@pnode1:~#

Conclusion:
Since we configure 4Mbps one way, it's like send & receive will get 8Mbps. Hence one should do appropriate computation to simulate the required network.
For instance, if you want to configure 8 Mbps link, split that into two halves for each way.

References:
OpenSolaris
Project crossbow
readme on hit-box (wan-simulator from project page)
Network community has reference to hxbt.

Acknowledgment:
I would like to thank Sunay Tripathi and Kais Belgaied for their support to help me understand the working of the wan-simualtor and creating it.

Friday, June 19, 2009

Testing scribefire for the first time

I got scribe fire installed, restarted the browser.
Configuring scribefire for blogspot is very simple.
Just provide the blog url, scribefire picked up the appropriate API and provided username & password.
Looks cool...

Technorati Tags:

Monday, June 18, 2007

Solving Performance issue of a Multi-threaded application on Solaris, Linux & Windows

On Solaris10 ::
When my colleague approached me with the following problem,
Brief background:: His application is a multi-threaded application running on T2000 which is multi-core multi-threaded server Ultra Sparc T1 processor. He asks,
How do I know which process is running on CPU ? And why it's running on that CPU? And what it's doing there? To start with,
There are tools around which he had used like "mpstat" to narrow down the problem to a particular CPU. In this case it was CPU 0 which is busy almost all the time and other CPUs are free.
And mpstat shows that "ithr" (so many network interrupts are being handled by this CPU) is flagging large numbers only on one of the CPUs. So gives us an starting point to drill down & we know that it's CPU 0 which is consuming most of the system resources and rest of all other CPUs are less loaded.

Tried digging a step(1) deeper to know who's running on CPU0? as follows,
# dtrace -n 'sched:::on-cpu /cpu == 0/ { @[execname, pid] = count() }'
>> Found out the culprit process name and it's pid by this method. This one-liner dtrace command can be used to identify any process which is occupying CPU most of the time with the sample interval. And come to know that this particular application is consuming most of the CPU time.

With this process name & process id, we started digging further a step(2) down to know what this process is doing? as follows,
# dtrace -n 'syscall:::entry /pid == 8876/ { @[probefunc] = count() }'
>> With this it's clear that, which systems calls are keeping the system busy and is it expected by this application. And the answer is Yes as this is a network intensive application which is busy all the time doing network related read & write operations. With this could make out that the top system calls that are being repeatedly called by this processes are read/write operations on the network.

Now looking at the solution, on Solaris 10u3 and above we can apply following changes to the system's kernel parameters to handle incoming network requests by all the available CPUs as follows,

set ip:ip_squeue_fanout=1
set ip:ip_squeue_bind=0
* the below value has to be based on the number of CPU's or cores available.
set ip:ip_soft_rings_cnt=16

After applying these changes one has to reboot the system. And then after reboot, measuring the performance of the same application gives almost double the performance and it was scaling up as we enable more CPUs on this system as all the incoming threads were handled by all the available CPUs. This is specific to application to application and this is being well written multi-threaded application it scaled well in this case. And now we could see that all the CPUs are handling the incoming network interrupts and all the CPUs are equally busy.

If it's a live production environment one can use ndd commands to change the dynamic kernel parameters to the live system and see the effect immediately without rebooting the node. Here are the ndd commands that can be used on a live system to set some of the above /etc/system values on to the live kernel learn as follows,

ndd -set /dev/ip ip_squeue_fanout 1
ndd -set /dev/ip ip_squeue_bind 0

To get the values from the live kernel one can use "ndd -get" option to get the values.

Description of the parameters that are set::

ip_squeue_fanout: Controls whether incoming connections from one NIC are fanned out across all CPUs. A value of 0 means incoming connections are assigned to the squeue attached to the interrupted CPU. A value of 1 means the connections are fanned out across all CPUs. The latter is required when NIC is faster than the CPU (say 10Gb NIC) and multiple CPUs need to service the NIC. Set by way of /etc/system by adding the following line:

set ip:ip_squeue_fanout=1

ip_squeue_bind: Controls whether worker threads are bound to specific CPUs or not. When bound (default), they give better locality. The non-default value (don't bind) should be chosen only when processor sets are to be created on the system. Unset by way of /etc/system by adding the following line:

set ip:ip_squeue_bind=0
ip_soft_rings_cnt: Determines the number of squeues to be used to fanout the incoming TCP/IP connections. The incoming traffic is placed on one of the rings. If the ring is overloaded, packets are dropped. For every packet that gets dropped, the kstat dls counter, dls_soft_ring_pkt_drop, is incremented.
Default: 2
Range: 0 - nCPUs, where nCPUs is the maximum number of CPUs in the system
Dynamic? No. The interface should be plumbed again when changing this parameter.
When to Change? Consider setting this parameter to a value greater than 2 on systems that have 10 Gbps NICs and many CPUs.
set ip:ip_soft_rings_cnt=16
Note:: Here by looking at the mpstat one can come to know that's the network interrupts which is the problem, but to add to that if a developer is curious to know if his/her own application is into this state this analysis is helpful to know that my own application which is facing the limitations/default setting an OS would have.

On Linux (RedHat/SuSe etc) & Windows (XP/Vista or any latest server) ::
If we happen to come across similar problem on Linux we would use top to find out which is the top process consuming most of the systems resources and try to drill down from there And probably take "strace" of that processes to know which system calls it's making and try to capture all that out put in a file and post processes that file to know which system call is being made most frequently etc.. and The overhead that strace brings in to the application is too much which one would like to avoid using it in the production environment.

If we happen to come across similar problem on Windows one would look at the available windows GUI to look at the top applications consuming the resources in the "Windows task Manager" window and can sort based on various parameters like CPU, memory etc. To drill down probably one can use windows native performance tools which will give high level info with regards to what's happening in the system. And can use use third party tools to profile a given application and understand what it is doing etc..

Well known profiling tool on windows & Linux for multi-threaded application are from "Intel® Thread Profiler 3.1 for Windows" Intel Thread profiler for Linux etc..

Open to know more tools on Linux & windows platforms which can help drill the problems easily without taxing the over-all application or system performance.

FreeRADIUS with MySQL cluster

About: This is all about deploying FreeRADIUS with MySQL cluster , understand about FreeRADIUS deployment options with MySQL cluster for h...