mercredi 8 juin 2016

VDBENCH: Waiting for configuration information from slave: hd2 , Waiting for configuration information from slave: hd1

Here is the stdout when I run the vdbench command with journaling and data validation activated. The vdbench master keeps waiting for the configuration information from the slave which it never gets and the Heartbeat monitor times out for the master as shown below.

10:22:33.924 Anchor size: anchor=/mnt/lustre/a: dirs:            6; files:            8; bytes:     1.000m (1,048,576)
10:22:33.926 Anchor size: anchor=/mnt/lustre/b: dirs:            6; files:            8; bytes:     1.000m (1,048,576)
10:22:33.928 Estimated totals for all 2 anchors: dirs: 12; files: 16; bytes: 2.000m
10:22:33.977 Starting slave: /home/user/vdbench/vdbench SlaveJvm -m 192.168.56.101 -n userclient-10-160608-10.22.33.794 -l hd1-0 -p 5570  
10:22:34.000 Starting slave: ssh kirtanost -l root /home/kirtan/vdbench/vdbench SlaveJvm -m 192.1
68.56.101 -n userost-11-160608-10.22.33.794 -l hd2-0 -p 5570  
10:22:34.451 All slaves are now connected
10:23:05.485 Waiting for configuration information from slave: hd2
10:23:05.485 Waiting for configuration information from slave: hd1
10:23:35.496 Waiting for configuration information from slave: hd2
10:23:35.496 Waiting for configuration information from slave: hd1
10:24:05.503 Waiting for configuration information from slave: hd2
10:24:05.503 Waiting for configuration information from slave: hd1
10:24:35.513 Waiting for configuration information from slave: hd2
10:24:35.513 Waiting for configuration information from slave: hd1
10:25:05.522 Waiting for configuration information from slave: hd2
10:25:05.522 Waiting for configuration information from slave: hd1
10:25:35.537 Waiting for configuration information from slave: hd2
10:25:35.537 Waiting for configuration information from slave: hd1
10:26:05.545 Waiting for configuration information from slave: hd2
10:26:05.545 Waiting for configuration information from slave: hd1
10:26:19.460 HeartBeat.checkHeartBeat(): slave hd1-0 has not responded for 185 seconds.
10:26:19.460 HeartBeat.checkHeartBeat(): slave hd2-0 has not responded for 185 seconds.
10:26:19.461 Start/end command: executing '/home/user/vdbench/vdbench jstack'
10:26:19.461 execute(): /home/user/vdbench/vdbench jstack
10:26:19.686 *
10:26:19.686 ***************************************************************************************
10:26:19.686 * Slave hd2-0 aborting: Heartbeat monitor: Master did not respond. Timeout value: 180 *
10:26:19.686 ***************************************************************************************
10:26:19.686 *
10:26:19.686 Slave hd1-0 killed by master
10:26:19.775 *
10:26:19.775 ***************************************************************************************
10:26:19.775 * Slave hd1-0 aborting: Heartbeat monitor: Master did not respond. Timeout value: 180 *
10:26:19.775 ***************************************************************************************
10:26:19.775 *
10:26:19.896 tg: java.lang.ThreadGroup[name=main,maxpri=10]
10:26:19.897 tg.get_name: main
10:26:19.897 tg.activeCount: 8
java.lang.ThreadGroup[name=main,maxpri=10]
Thread[Vdbmain,5,main]
Thread[SlaveStarter userclient-10-160608-10.22.33.794,5,main]
Thread[Get_cmd_stream stderr /home/kirtan/vdbench/vdbench SlaveJvm -m         192.168.56.101 -n kirtanclient-10-160608-10.22.33.794 -l hd1-0 -p 5570,5,main]
Thread[Get_cmd_stream stdout /home/user/vdbench/vdbench SlaveJvm -m 192.168.56.101 -n kirtanclient-10-160608-10.22.33.794 -l hd1-0 -p 5570,5,main]
Thread[SlaveStarter kirtanost-11-160608-10.22.33.794,5,main]
Thread[Get_cmd_stream stderr ssh userost -l root /home/user/vdbench/vdbench SlaveJvm -m 192.168.56.101 -n userost-11-160608-10.22.33.794 -l hd2-0 -p 5570,5,main]
Thread[Get_cmd_stream stdout ssh kirtanost -l root /home/kirtan/vdbench/vdbench SlaveJvm -m 192.168.56.101 -n kirtanost-11-160608-10.22.33.794 -l hd2-0 -p 5570,5,main]
Thread[Check Slave HeartBeat,5,main]

This is the Parameter file

*Host definition
hd=default,vdbench=/home/user/vdbench,user=root
hd=hd1,system=userclient,user=root,shell=ssh
hd=hd2,system=usernost,user=root,shell=ssh

*File system definition
fsd=fsd1,anchor=/mnt/lustre/a,depth=2,width=2,files=2,size=128k
fsd=fsd2,anchor=/mnt/lustre/b,depth=2,width=2,files=2,size=128k

*File system Workload definition
fwd=fwd1,host=hd1,fsd=fsd1,operation=read,xfersize=4k,fileio=sequential,fileselect=random,threads=2
fwd=fwd2,host=hd2,fsd=fsd2,operation=read,xfersize=4k,fileio=sequential,fileselect=random,threads=2

*Run definition
rd=rd1,fwd=fwd*,fwdrate=max,format=yes,elapsed=10,interval=1,operations=(read,write,getattr,setattr)

Can somebody explain why is the script not working ?

Aucun commentaire:

Enregistrer un commentaire