Discussion:
[Ganglia-general] Setting up Ganglia
Johann Spies
2008-11-26 13:51:55 UTC
Permalink
I do not have a lot of success in getting Ganglia to work on cluster
of 22 computers.

It is an OpenSuse system and I have installed gmond on all the nodes
and it is running an all the nodes.

On the head-node I have the following in /etc/ganglia/metad.conf:

data_source "Ratashasta" localhost:8649 192.168.129.2:8649 192.168.129.3:8649 192.168.129.4:8649 \
192.168.129.5:8649 192.168.129.6:8649 192.168.129.7:8649 192.168.129.8:8649 192.168.129.9:8649 \
192.168.129.10:8649 192.168.129.11:8649 192.168.129.12:8649 \
192.168.129.13:8649 192.168.129.14:8649 192.168.129.15:8649 \
192.168.129.16:8649 192.168.129.17:8649 192.168.129.18:8649 \

and

trusted_hosts 127.0.0.1 192.168.129.2 192.168.129.3 192.168.129.4 \
192.168.129.5 192.168.129.6 192.168.129.7 192.168.129.8 192.168.129.9 \
192.168.129.10 192.168.129.11 192.168.129.12 \
192.168.129.13 192.168.129.14 192.168.129.15 \
192.168.129.16 192.168.129.17 192.168.129.18 \
192.168.129.19 192.168.129.20 192.168.129.21 192.168.129.22

When I run sudo gstat -a
The result is:


CLUSTER INFORMATION
Name: Ratashta
Hosts: 1
Gexec Hosts: 0
Dead Hosts: 0
Localtime: Wed Nov 26 15:48:15 2008

CLUSTER HOSTS
Hostname LOAD CPU
Gexec
CPUs (Procs/Total) [ 1, 5, 15min] [ User, Nice, System,
Idle, Wio]

head001.sun.ac.za
0 ( 0/ 289) [ 0.04, 0.31, 0.52] [ 0.1, 0.0, 0.0, 99.4, 0.4] OFF

I have changed the firewall through yast to allow udp traffic from
146.232.128.108/32 on port 8649 as well as from 192.168.129.0/24 (the
nodes on internal network.

I have gmond running an all the nodes.

I don't see that IPTABLES is blocking communication between the
head-node and the other and I don't know what to look for next.

Any help will be appreciated.


Regards
Johann
--
Johann Spies Telefoon: 021-808 4036
Informasietegnologie, Universiteit van Stellenbosch

"In the beginning God created the heaven and the
earth."
Genesis 1:1
Brad Nicholes
2008-11-26 17:11:40 UTC
Permalink
Your gmetad data_source should not be trying to talk to every gmond in your cluster. In multicast mode, which is the default mode, every gmond talks to every other gmond and stores metrics for the entire cluster. What this means is that gmetad only need to ping one gmond node to get the metrics for the entire cluster. The only reason to include more than one IP address in the data_source is for fail over purposes. If the primary gmond goes down, then the secondary data_source will pick up and report for the rest of the cluster. Even in unicast mode where all of the gmond nodes talk to a single gmond node rather than to every other node, your data_source should only be referencing the master gmond node.

Brad
Post by Johann Spies
I do not have a lot of success in getting Ganglia to work on cluster
of 22 computers.
It is an OpenSuse system and I have installed gmond on all the nodes
and it is running an all the nodes.
data_source "Ratashasta" localhost:8649 192.168.129.2:8649
192.168.129.3:8649 192.168.129.4:8649 \
192.168.129.5:8649 192.168.129.6:8649 192.168.129.7:8649
192.168.129.8:8649 192.168.129.9:8649 \
192.168.129.10:8649 192.168.129.11:8649 192.168.129.12:8649 \
192.168.129.13:8649 192.168.129.14:8649 192.168.129.15:8649 \
192.168.129.16:8649 192.168.129.17:8649 192.168.129.18:8649 \
and
trusted_hosts 127.0.0.1 192.168.129.2 192.168.129.3 192.168.129.4 \
192.168.129.5 192.168.129.6 192.168.129.7 192.168.129.8 192.168.129.9 \
192.168.129.10 192.168.129.11 192.168.129.12 \
192.168.129.13 192.168.129.14 192.168.129.15 \
192.168.129.16 192.168.129.17 192.168.129.18 \
192.168.129.19 192.168.129.20 192.168.129.21 192.168.129.22
When I run sudo gstat -a
CLUSTER INFORMATION
Name: Ratashta
Hosts: 1
Gexec Hosts: 0
Dead Hosts: 0
Localtime: Wed Nov 26 15:48:15 2008
CLUSTER HOSTS
Hostname LOAD CPU
Gexec
CPUs (Procs/Total) [ 1, 5, 15min] [ User, Nice, System,
Idle, Wio]
head001.sun.ac.za
0 ( 0/ 289) [ 0.04, 0.31, 0.52] [ 0.1, 0.0, 0.0, 99.4,
0.4] OFF
I have changed the firewall through yast to allow udp traffic from
146.232.128.108/32 on port 8649 as well as from 192.168.129.0/24 (the
nodes on internal network.
I have gmond running an all the nodes.
I don't see that IPTABLES is blocking communication between the
head-node and the other and I don't know what to look for next.
Any help will be appreciated.
Regards
Johann
Brad Fino
2008-11-26 22:52:40 UTC
Permalink
I think he's just confusing the roles of gmetad and gmond.

Johann,
Your flow should look like this:
gmond client (node) - - > gmond server (cluster master) < - - > gmetad

So my gmond.conf for a node (www001) in Cluster A looks like this: (I use
unicast as opposed to multicast)

udp_send_channel {
mcast_join = mon1
port = 8650
ttl = 1
}

This sends data to another gmond running on another server collecting data
from all the nodes (www002, www003, etc)

My gmond.conf on the ClusterA master (mon1) looks like:

udp_send_channel {
mcast_join = mon1
port = 8650
ttl = 1
}

udp_recv_channel {
port = 8650
family = inet4
}

tcp_accept_channel {
port = 8650
}

My gmetad on my grid master looks like :

data_source "$foo" mon1:8661


Date: Wed, 26 Nov 2008 10:11:40 -0700
Subject: Re: [Ganglia-general] Setting up Ganglia
Content-Type: text/plain; charset=US-ASCII
Your gmetad data_source should not be trying to talk to every gmond in
your cluster. In multicast mode, which is the default mode, every gmond
talks to every other gmond and stores metrics for the entire cluster. What
this means is that gmetad only need to ping one gmond node to get the
metrics for the entire cluster. The only reason to include more than one IP
address in the data_source is for fail over purposes. If the primary gmond
goes down, then the secondary data_source will pick up and report for the
rest of the cluster. Even in unicast mode where all of the gmond nodes talk
to a single gmond node rather than to every other node, your data_source
should only be referencing the master gmond node.
Brad
Johann
Post by Johann Spies
I do not have a lot of success in getting Ganglia to work on cluster
of 22 computers.
It is an OpenSuse system and I have installed gmond on all the nodes
and it is running an all the nodes.
data_source "Ratashasta" localhost:8649 192.168.129.2:8649
192.168.129.3:8649 192.168.129.4:8649 \
192.168.129.5:8649 192.168.129.6:8649 192.168.129.7:8649
192.168.129.8:8649 192.168.129.9:8649 \
192.168.129.10:8649 192.168.129.11:8649 192.168.129.12:8649 \
192.168.129.13:8649 192.168.129.14:8649 192.168.129.15:8649 \
192.168.129.16:8649 192.168.129.17:8649 192.168.129.18:8649 \
and
trusted_hosts 127.0.0.1 192.168.129.2 192.168.129.3 192.168.129.4 \
192.168.129.5 192.168.129.6 192.168.129.7 192.168.129.8 192.168.129.9 \
192.168.129.10 192.168.129.11 192.168.129.12 \
192.168.129.13 192.168.129.14 192.168.129.15 \
192.168.129.16 192.168.129.17 192.168.129.18 \
192.168.129.19 192.168.129.20 192.168.129.21 192.168.129.22
When I run sudo gstat -a
CLUSTER INFORMATION
Name: Ratashta
Hosts: 1
Gexec Hosts: 0
Dead Hosts: 0
Localtime: Wed Nov 26 15:48:15 2008
CLUSTER HOSTS
Hostname LOAD CPU
Gexec
CPUs (Procs/Total) [ 1, 5, 15min] [ User, Nice, System,
Idle, Wio]
head001.sun.ac.za
0 ( 0/ 289) [ 0.04, 0.31, 0.52] [ 0.1, 0.0, 0.0, 99.4,
0.4] OFF
I have changed the firewall through yast to allow udp traffic from
146.232.128.108/32 on port 8649 as well as from 192.168.129.0/24 (the
nodes on internal network.
I have gmond running an all the nodes.
I don't see that IPTABLES is blocking communication between the
head-node and the other and I don't know what to look for next.
Any help will be appreciated.
Regards
Johann
------------------------------
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's
challenge
Build the coolest Linux based applications with Moblin SDK & win great
prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
------------------------------
_______________________________________________
Ganglia-general mailing list
https://lists.sourceforge.net/lists/listinfo/ganglia-general
End of Ganglia-general Digest, Vol 30, Issue 17
***********************************************
--
Brad E. Fino
***@rockyou.com
858-245-9099
Johann Spies
2008-11-27 08:52:51 UTC
Permalink
Hello Brad,
Post by Brad Fino
I think he's just confusing the roles of gmetad and gmond.
Obviously. :(
Post by Brad Fino
gmond client (node) - - > gmond server (cluster master) < - - > gmetad
So my gmond.conf for a node (www001) in Cluster A looks like this: (I use
unicast as opposed to multicast)
Can you explain to me what the difference is in the setup please. My
knowledge of multicasting is that of a total newbie. I have for
example tried the following (following your example):


udp_recv_channel {
mcast_join = 192.168.129.1
port = 8649
family = inet4
}

But then get the error:

Error creating multicast server mcast_join=192.168.129.1 port=8649
mcast_if=NULL family='inet4'. Exiting.

Regards
Johann
--
Johann Spies Telefoon: 021-808 4036
Informasietegnologie, Universiteit van Stellenbosch

"Thou, even thou, art LORD alone; thou hast made
heaven, the heaven of heavens, with all their host,
the earth, and all things that are therein, the seas,
and all that is therein, and thou preservest them all;
and the host of heaven worshippeth thee."
Nehemiah 9:6
Carlo Marcelo Arenas Belon
2008-11-27 09:22:42 UTC
Permalink
Post by Johann Spies
Can you explain to me what the difference is in the setup please.
there is a lot of interesting information in :

http://ganglia.wiki.sourceforge.net/ganglia_documents

the IBM wiki link is a very good step by step description of how to get
ganglia working (even if it is more oriented to IBM systems)

but the easier way to get you going if you have a typical cluster
configuration as described by :

* all your nodes are in the same VLAN and have 1 active network interface
* you have gmond installed in all nodes
* you have gmetad installed in at least 1 node with a frontend with PHP

in that case the all you have to do is :

* stop iptables, selinux or anything else that might get in the way
* start gmond in all nodes with the default configuration, or regenerate
one on each node using `gmond -t`
* start gmetad with the default configuration
* start apache and go to the "/ganglia" directory to see your cluster
Post by Johann Spies
knowledge of multicasting is that of a total newbie. I have for
udp_recv_channel {
mcast_join = 192.168.129.1
port = 8649
family = inet4
}
Error creating multicast server mcast_join=192.168.129.1 port=8649
mcast_if=NULL family='inet4'. Exiting.
192.168.129.1 is not a multicast IP, the one provided in the
configuration by default should work better, if there is no other gmond
already running (or other process already using port 8649)

Carlo
Johann Spies
2008-11-28 08:59:41 UTC
Permalink
Post by Carlo Marcelo Arenas Belon
http://ganglia.wiki.sourceforge.net/ganglia_documents
I have seen them all.
Post by Carlo Marcelo Arenas Belon
the IBM wiki link is a very good step by step description of how to get
ganglia working (even if it is more oriented to IBM systems)
I have seen this also.
Post by Carlo Marcelo Arenas Belon
but the easier way to get you going if you have a typical cluster
* all your nodes are in the same VLAN and have 1 active network interface
* you have gmond installed in all nodes
* you have gmetad installed in at least 1 node with a frontend with PHP
Done this
Post by Carlo Marcelo Arenas Belon
* stop iptables, selinux or anything else that might get in the way
* start gmond in all nodes with the default configuration, or regenerate
one on each node using `gmond -t`
* start gmetad with the default configuration
* start apache and go to the "/ganglia" directory to see your cluster
I have done all this when I started out without disabling IPTABLES (done that now) and I
still get

/usr/local/sbin/gmond -t > /etc/ganglia/gmond.conf
sudo gstat -a
CLUSTER INFORMATION
Name: unspecified
Hosts: 1
Gexec Hosts: 0
Dead Hosts: 0
Localtime: Fri Nov 28 10:55:13 2008

CLUSTER HOSTS
Hostname LOAD CPU
Gexec
CPUs (Procs/Total) [ 1, 5, 15min] [ User, Nice, System,
Idle, Wio]

head001.sun.ac.za
4 ( 1/ 293) [ 0.94, 0.56, 0.76] [ 24.9, 0.0, 0.7,
73.2, 1.3] OFF


With no information from comp001-comp021 although gmond is running on
each one of them.

The reason for my previous mail was that the person described a
solution in his/her situation using monocasting and not multicasting.
I suspect multicasting is not working on my system.

Regards
Johann
--
Johann Spies Telefoon: 021-808 4036
Informasietegnologie, Universiteit van Stellenbosch

"The earth is the LORD'S, and the fulness thereof; the
world, and they that dwell therein." Psalms 24:1
Carlo Marcelo Arenas Belon
2008-11-28 12:12:56 UTC
Permalink
Post by Johann Spies
/usr/local/sbin/gmond -t > /etc/ganglia/gmond.conf
presume you restarted gmond after changing the configuration
Post by Johann Spies
sudo gstat -a
CLUSTER INFORMATION
Name: unspecified
Hosts: 1
Gexec Hosts: 0
Dead Hosts: 0
Localtime: Fri Nov 28 10:55:13 2008
CLUSTER HOSTS
Hostname LOAD CPU
Gexec
CPUs (Procs/Total) [ 1, 5, 15min] [ User, Nice, System,
Idle, Wio]
head001.sun.ac.za
4 ( 1/ 293) [ 0.94, 0.56, 0.76] [ 24.9, 0.0, 0.7,
73.2, 1.3] OFF
With no information from comp001-comp021 although gmond is running on
each one of them.
if iptables is off in all of them and they are all (including head001) in the same network
segment, then you have a network problem.

# tcpdump host 239.2.11.71

should show you are getting multicast messages from all the nodes, but
probably only shows packets from head001 instead.
Post by Johann Spies
The reason for my previous mail was that the person described a
solution in his/her situation using monocasting and not multicasting.
I suspect multicasting is not working on my system.
to change to unicast what you have to do is change the configuration
from :

/* Feel free to specify as many udp_send_channels as you like. Gmond
used to only support having a single channel */
udp_send_channel {
mcast_join = 239.2.11.71
port = 8649
ttl = 1
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
mcast_join = 239.2.11.71
port = 8649
bind = 239.2.11.71
}

into (assuming all your nodes can resolve that name, otherwise use the ip)

udp_send_channel {
host = head001.sun.ac.za
port = 8649
}

udp_recv_channel {
port = 8649
}

then restart all your gmond with the new configuration (presume iptables
is disabled in all the other nodes as well)

Carlo
Johann Spies
2008-11-28 12:35:02 UTC
Permalink
Post by Carlo Marcelo Arenas Belon
to change to unicast what you have to do is change the configuration
Thanks! Changing the configuration to use unicast made a big
difference. It is working now.

Regards
Johann
--
Johann Spies Telefoon: 021-808 4036
Informasietegnologie, Universiteit van Stellenbosch

"The earth is the LORD'S, and the fulness thereof; the
world, and they that dwell therein." Psalms 24:1
Continue reading on narkive:
Loading...