标签归档:大数据

MySQL大数据测试

批量插入数据

// Laravel框架代码
        $s = microtime(true);
        
        for($j=0;$j<100;$j++){
            $data = [];
            $ss = microtime(true);
            for($i=0;$i<5000;$i++){
                $d = [
                    'name' => str_random(64)." ".($j+1)*($i+1),
                    'cat1' => str_random(32),
                    'cat2' => str_random(32),
                    'cat3' => str_random(32),
                    'num1' => mt_rand(10000,99999),
                    'num2' => mt_rand(10000,99999),
                    'num3' => mt_rand(10000,99999).".01",
                    'num4' => mt_rand(10000,99999),
                    'num5' => mt_rand(10000,99999)
                ];
                $data[] = $d;
            }
            DB::table('list_test')->insert($data);
            $ee = microtime(true);
            
            Log::info((float)($ee-$ss));
        }
        
        $e = microtime(true);
        Log::info('Total:'.(float)($e - $s));
// 输出
[2015-11-30 13:56:10] local.INFO: 6.0573468208313  
[2015-11-30 13:56:15] local.INFO: 4.7142698764801  
[2015-11-30 13:56:20] local.INFO: 4.7502720355988  
[2015-11-30 13:56:24] local.INFO: 4.6482660770416  
[2015-11-30 13:56:29] local.INFO: 4.8062751293182  
[2015-11-30 13:56:34] local.INFO: 4.8232760429382  
[2015-11-30 13:56:39] local.INFO: 4.5732619762421  
[2015-11-30 13:56:43] local.INFO: 4.7092700004578  
[2015-11-30 13:56:48] local.INFO: 4.9792850017548  
[2015-11-30 13:56:53] local.INFO: 4.8652780056  
[2015-11-30 13:56:58] local.INFO: 4.9532828330994  
[2015-11-30 13:57:03] local.INFO: 4.9182820320129
.................................................
[2015-11-30 14:03:40] local.INFO: 4.8382771015167  
[2015-11-30 14:03:45] local.INFO: 4.7792727947235  
[2015-11-30 14:03:50] local.INFO: 4.9002799987793  
[2015-11-30 14:03:54] local.INFO: 4.7752728462219  
[2015-11-30 14:03:59] local.INFO: 4.8892788887024  
[2015-11-30 14:04:04] local.INFO: 4.7812731266022  
[2015-11-30 14:04:09] local.INFO: 4.9482831954956  
[2015-11-30 14:04:14] local.INFO: 5.0152869224548  
[2015-11-30 14:04:19] local.INFO: 5.0332868099213  
[2015-11-30 14:04:19] local.INFO: Total:494.97931194305  

每次生成5000条数据并批量插入,共插入50万条数。每次插入5000,耗时5秒左右,看起来很不满意,实际是时间应该耗费在产生数据的计算上,真正耗费在插入的事件,应该在毫秒甚至微秒级别。而且可以看到,耗费的时间,不会因为记录量上升而上升,都维持在5秒左右,这个本身就说明插入的时间非常微小。总耗时495秒,大概8分钟,MySQL的批量插入性能还可以可以肯定的。左右对比也不会差到什么地方。

如果以上的程序由于需要产生随机数据耗费了过多时间而无法准确评估,那么可以使用如下SQL来试试:

[SQL]INSERT INTO list_test_copy(name,cat1,cat2,cat3,num1,num2,num3,num4,num5) 
	SELECT name,cat1,cat2,cat3,num1,num2,num3,num4,num5 FROM list_test

受影响的行: 500000
时间: 31.640s

这个结果应该能比较准确的评估当前MySQL的性能,50w条记录批量插入,耗时31.64秒,平均每5000条耗时是0.3164秒,所有以上的5秒时间,有4秒多是用于产生随机数据的。

再来做一个全量查询:

[SQL]SELECT l.name, l.cat1, l.cat2, l.cat3,l.num1,l.num2,l.num3,l.num4,l.num5, 
	(l.num1 - ll.num1) AS diff1,
	(l.num2 - ll.num2) AS diff2,
	(l.num3 - ll.num3) AS diff3,
	(l.num4 - ll.num4) AS diff4,
	(l.num5 - ll.num5) AS diff5,
IF(ll.id IS NULL,1,0)
FROM list l LEFT JOIN list_ ll ON l.id = ll.id

受影响的行: 0
时间: 12.295s

这是一个50万的表,LEFT JOIN一个40万的表,共耗时12.295秒。以下来个大偏移,翻到49万条记录:

[SQL]SELECT l.name, l.cat1, l.cat2, l.cat3,l.num1,l.num2,l.num3,l.num4,l.num5, 
	(l.num1 - ll.num1) AS diff1,
	(l.num2 - ll.num2) AS diff2,
	(l.num3 - ll.num3) AS diff3,
	(l.num4 - ll.num4) AS diff4,
	(l.num5 - ll.num5) AS diff5,
IF(ll.id IS NULL,1,0)
FROM list l LEFT JOIN list_ ll ON l.id = ll.id LIMIT 100000

受影响的行: 0
时间: 2.395s

[SQL]SELECT l.name, l.cat1, l.cat2, l.cat3,l.num1,l.num2,l.num3,l.num4,l.num5, 
	(l.num1 - ll.num1) AS diff1,
	(l.num2 - ll.num2) AS diff2,
	(l.num3 - ll.num3) AS diff3,
	(l.num4 - ll.num4) AS diff4,
	(l.num5 - ll.num5) AS diff5,
IF(ll.id IS NULL,1,0)
FROM list l LEFT JOIN list_ ll ON l.id = ll.id LIMIT 200000

受影响的行: 0
时间: 3.592s

[SQL]SELECT l.name, l.cat1, l.cat2, l.cat3,l.num1,l.num2,l.num3,l.num4,l.num5, 
	(l.num1 - ll.num1) AS diff1,
	(l.num2 - ll.num2) AS diff2,
	(l.num3 - ll.num3) AS diff3,
	(l.num4 - ll.num4) AS diff4,
	(l.num5 - ll.num5) AS diff5,
IF(ll.id IS NULL,1,0)
FROM list l LEFT JOIN list_ ll ON l.id = ll.id LIMIT 300000

受影响的行: 0
时间: 4.735s

[SQL]SELECT l.name, l.cat1, l.cat2, l.cat3,l.num1,l.num2,l.num3,l.num4,l.num5, 
	(l.num1 - ll.num1) AS diff1,
	(l.num2 - ll.num2) AS diff2,
	(l.num3 - ll.num3) AS diff3,
	(l.num4 - ll.num4) AS diff4,
	(l.num5 - ll.num5) AS diff5,
IF(ll.id IS NULL,1,0)
FROM list l LEFT JOIN list_ ll ON l.id = ll.id LIMIT 400000

受影响的行: 0
时间: 7.568s

[SQL]SELECT l.name, l.cat1, l.cat2, l.cat3,l.num1,l.num2,l.num3,l.num4,l.num5, 
	(l.num1 - ll.num1) AS diff1,
	(l.num2 - ll.num2) AS diff2,
	(l.num3 - ll.num3) AS diff3,
	(l.num4 - ll.num4) AS diff4,
	(l.num5 - ll.num5) AS diff5,
IF(ll.id IS NULL,1,0)
FROM list l LEFT JOIN list_ ll ON l.id = ll.id LIMIT 490000

受影响的行: 0
时间: 11.576s

看起来不怎么满意?好吧,试试大偏移,但是取少来记录的情况吧:

[SQL]SELECT l.name, l.cat1, l.cat2, l.cat3,l.num1,l.num2,l.num3,l.num4,l.num5, 
	(l.num1 - ll.num1) AS diff1,
	(l.num2 - ll.num2) AS diff2,
	(l.num3 - ll.num3) AS diff3,
	(l.num4 - ll.num4) AS diff4,
	(l.num5 - ll.num5) AS diff5,
IF(ll.id IS NULL,1,0)
FROM list l LEFT JOIN list_ ll ON l.id = ll.id LIMIT 100000,100

受影响的行: 0
时间: 0.475s

[SQL]SELECT l.name, l.cat1, l.cat2, l.cat3,l.num1,l.num2,l.num3,l.num4,l.num5, 
	(l.num1 - ll.num1) AS diff1,
	(l.num2 - ll.num2) AS diff2,
	(l.num3 - ll.num3) AS diff3,
	(l.num4 - ll.num4) AS diff4,
	(l.num5 - ll.num5) AS diff5,
IF(ll.id IS NULL,1,0)
FROM list l LEFT JOIN list_ ll ON l.id = ll.id LIMIT 200000,100

受影响的行: 0
时间: 0.645s

[SQL]SELECT l.name, l.cat1, l.cat2, l.cat3,l.num1,l.num2,l.num3,l.num4,l.num5, 
	(l.num1 - ll.num1) AS diff1,
	(l.num2 - ll.num2) AS diff2,
	(l.num3 - ll.num3) AS diff3,
	(l.num4 - ll.num4) AS diff4,
	(l.num5 - ll.num5) AS diff5,
IF(ll.id IS NULL,1,0)
FROM list l LEFT JOIN list_ ll ON l.id = ll.id LIMIT 300000,100

受影响的行: 0
时间: 2.706s

[SQL]SELECT l.name, l.cat1, l.cat2, l.cat3,l.num1,l.num2,l.num3,l.num4,l.num5, 
	(l.num1 - ll.num1) AS diff1,
	(l.num2 - ll.num2) AS diff2,
	(l.num3 - ll.num3) AS diff3,
	(l.num4 - ll.num4) AS diff4,
	(l.num5 - ll.num5) AS diff5,
IF(ll.id IS NULL,1,0)
FROM list l LEFT JOIN list_ ll ON l.id = ll.id LIMIT 400000,100

受影响的行: 0
时间: 5.984s

可以看到20万前偏移量,秒级完成。(我这是Windows,同等条件下,在Linux下提升一个数量级不夸张)。偏移量到40w,花费6秒,就是定位这个偏移花费了怎么多时间,这个时间就不太满意了。不过翻页的话,给10万条让你翻,就搞不过来了。

下面试试档表的大偏移:

[SQL]SELECT * FROM list_copy_copy LIMIT 100000,200

受影响的行: 0
时间: 0.449s

[SQL]SELECT * FROM list_copy_copy LIMIT 200000,200

受影响的行: 0
时间: 0.534s

[SQL]SELECT * FROM list_copy_copy LIMIT 300000,200

受影响的行: 0
时间: 0.577s

[SQL]SELECT * FROM list_copy_copy LIMIT 400000,200

受影响的行: 0
时间: 0.680s

[SQL]SELECT * FROM list_copy_copy LIMIT 499000,200

受影响的行: 0
时间: 0.774s
[/sql]
这个单表的大数据偏移,还是比较满意的。

以上数据只是在我个人计算机上做的简单测试,而且是Windows环境。

用到的测试表结构:

CREATE TABLE `list` (
`id`  int(11) UNSIGNED NOT NULL AUTO_INCREMENT ,
`name`  varchar(128) CHARACTER SET utf8 COLLATE utf8_unicode_ci NULL DEFAULT NULL ,
`cat1`  varchar(64) CHARACTER SET utf8 COLLATE utf8_unicode_ci NULL DEFAULT NULL ,
`cat2`  varchar(64) CHARACTER SET utf8 COLLATE utf8_unicode_ci NULL DEFAULT NULL ,
`cat3`  varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci NULL DEFAULT NULL ,
`num1`  int(11) NULL DEFAULT NULL ,
`num2`  int(11) NULL DEFAULT NULL ,
`num3`  decimal(5,2) NULL DEFAULT NULL ,
`num4`  float(64,20) NULL DEFAULT NULL ,
`num5`  float(64,20) NULL DEFAULT NULL ,
PRIMARY KEY (`id`)
)
ENGINE=InnoDB
DEFAULT CHARACTER SET=utf8 COLLATE=utf8_unicode_ci
AUTO_INCREMENT=1
ROW_FORMAT=COMPACT
;

CREATE TABLE `list_copy` (
`id`  int(11) UNSIGNED NOT NULL AUTO_INCREMENT ,
`name`  varchar(128) CHARACTER SET utf8 COLLATE utf8_unicode_ci NULL DEFAULT NULL ,
`cat1`  varchar(64) CHARACTER SET utf8 COLLATE utf8_unicode_ci NULL DEFAULT NULL ,
`cat2`  varchar(64) CHARACTER SET utf8 COLLATE utf8_unicode_ci NULL DEFAULT NULL ,
`cat3`  varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci NULL DEFAULT NULL ,
`num1`  int(11) NULL DEFAULT NULL ,
`num2`  int(11) NULL DEFAULT NULL ,
`num3`  decimal(5,2) NULL DEFAULT NULL ,
`num4`  float(64,20) NULL DEFAULT NULL ,
`num5`  float(64,20) NULL DEFAULT NULL ,
`diff1`  int(11) NULL DEFAULT NULL ,
`diff2`  int(11) NULL DEFAULT NULL ,
`diff3`  decimal(10,2) NULL DEFAULT NULL ,
`diff4`  float(64,20) NULL DEFAULT NULL ,
`diff5`  float(64,20) NULL DEFAULT NULL ,
`is_new`  tinyint(2) NULL DEFAULT NULL ,
PRIMARY KEY (`id`)
)
ENGINE=InnoDB
DEFAULT CHARACTER SET=utf8 COLLATE=utf8_unicode_ci
AUTO_INCREMENT=1
ROW_FORMAT=COMPACT
;