I'm trying to test my 3 Cassandra(3.11.3.5) cluster nodes using cassandra-stress. Currently I'm running 3 nodes and 1 machine where cassandra-tool is running, everything is on an openvpn network.
I have created my .yaml user profile test file, here:
### DML ###
# Keyspace Name
keyspace: mykeyspace
# The CQL for creating a keyspace (optional if it already exists)
keyspace_definition: |
CREATE KEYSPACE mykeyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '2'} AND durable_writes = false;
# Table name
table: mytable
# The CQL for creating a table you wish to stress (optional if it already exists)
table_definition: |
CREATE TABLE mytable (
id bigint,
type int,
txt text,
event_datetime timestamp,
bigtxt text,
page int,
PRIMARY KEY ((id, type), page, event_datetime)
) WITH CLUSTERING ORDER BY (page DESC, event_datetime DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = 'ciao'
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 90000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
### Column Distribution Specifications ###
columnspec:
- name: id
size: gaussian(1..1000)
population: gaussian(1..500k)
- name: type
size: gaussian(0..5)
population: gaussian(1..5)
- name: event_datetime
cluster: fixed(1)
- name: page
size: ~exp(1..20)
population: ~exp(1..20)
cluster: fixed(1)
- name: txt
size: exp(30..1k)
- name: bigtxt
size: gaussian(10k..30M)
### Batch Ratio Distribution Specifications ###
insert:
partitions: fixed(1) # Our partition key is the domain so only insert one per batch
select: fixed(1)/1
batchtype: UNLOGGED # Unlogged batches
queries:
pages:
cql: select id, page, type, txt, event_datetime, bigtxt from mytable where id = ? and type = ? and page=? limit 10;
fields: multirow
Commands I run so far are modification of this one:
cassandra-stress user n=30 profile=./myprofile.yml ops\(insert=1\) -rate threads=10 -node 10.5.0.1,10.5.0.6,10.5.0.8
what I have changed:
- n
- thread count
- used throttle
- used fixed
and I always got some errors, like:
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: 10.5.0.1/10.5.0.1:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [10.5.0.1/10.5.0.1] Timed out waiting for server response), 10.5.0.6/10.5.0.6:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [10.5.0.6/10.5.0.6] Timed out waiting for server response), 10.5.0.8/10.5.0.8:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [10.5.0.8/10.5.0.8] Timed out waiting for server response))
I get those errors also in the warming phase.
And the run results is something like:
Results:
Op rate : 0 op/s [insert: 1 op/s]
Partition rate : 0 pk/s [insert: 1 pk/s]
Row rate : 2 row/s [insert: 3 row/s]
Latency mean : 38172.3 ms [insert: 38,172.3 ms]
Latency median : 37279.0 ms [insert: 37,279.0 ms]
Latency 95th percentile : 59190.0 ms [insert: 59,190.0 ms]
Latency 99th percentile : 59458.5 ms [insert: 59,458.5 ms]
Latency 99.9th percentile : 59458.5 ms [insert: 59,458.5 ms]
Latency max : 59458.5 ms [insert: 59,458.5 ms]
Total partitions : 30 [insert: 30]
Total errors : 0 [insert: 0]
Total GC count : 2
Total GC memory : 3.396 GiB
Total GC time : 0.2 seconds
Avg GC time : 117.0 ms
StdDev GC time : 0.0 ms
Total operation time : 00:01:04
What I cannot understand is:
-
why does latency time is completely different from what I get if I run
nodetool cfhistograms mykeyspace mytable
on nodes? on nodes I get something like 200ms in latency 99%ile for writes. Could be related to the time that cassandra-tool spends to send MBs of data to the coordinator? -
why don't I see any effects in changing throttle/fixed rate switch in the cassandra-tool command? I always get 1 op/s
-
why do I get the previous errors so often if I have a production cluster, like the testing one, that doesn't have those errors even if it receives a much heavier mixed workload (600k reads and 1M writes in about 3 hours)?
Aucun commentaire:
Enregistrer un commentaire