Shinguz: MySQL spatial functionality – points of interest around me

存储架构 2016-06-01 阅读原文

This week I was preparing the exercises for our MySQL/MariaDB for Beginners training . One of the exercises of the training is about MySQL spatial (GIS) features . I always tell customers: "With these features you can answer questions like: Give me all points of interest around me!"

Now I wanted to try out how it really works and if it is that easy at all...

To get myself an idea of what I want to do I did a little sketch first:

My position
Shops
Restaurants
Cafes

To do this I needed a table and some data:

CREATE TABLE poi (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT
, name VARCHAR(40)
, type VARCHAR(20)
, sub_type VARCHAR(20)
, pt POINT NOT NULL
, PRIMARY KEY (id)
, SPATIAL INDEX(pt)
) ENGINE=InnoDB;

INSERT INTO poi (name, type, sub_type, pt) VALUES
  ('Shop 1', 'Shop', 'Cloth', Point(2,2))
, ('Cafe 1', 'Cafe', '',  Point(11,2))
, ('Shop 2', 'Shop', 'Cloth',  Point(5,4))
, ('Restaurant 1', 'Restaurant', 'Portugies',  Point(8,7))
, ('Cafe 2', 'Cafe', '',  Point(3,9))
, ('Shop 3', 'Shop', 'Hardware',  Point(11,9))
;

This looks as follows:

SELECT id, CONCAT(ST_X(pt), '/', ST_Y(pt)) AS "X/Y", name, type, sub_type
  FROM poi;
+----+-----------+--------------+------------+-----------+
| id | X/Y       | name         | type       | sub_type  |
+----+-----------+--------------+------------+-----------+
|  1 | 2/2       | Shop 1       | Shop       | Cloth     |
|  2 | 11/2      | Cafe 1       | Cafe       |           |
|  3 | 5/4       | Shop 2       | Shop       | Cloth     |
|  4 | 8/7       | Restaurant 1 | Restaurant | Portugies |
|  5 | 3/9       | Cafe 2       | Cafe       |           |
|  6 | 11/9      | Shop 3       | Shop       | Hardware  |
+----+-----------+--------------+------------+-----------+

Now the question: "Give me all shops in a distance of 4.5 units around me":

SET @hereami = POINT(9,4);

SELECT id, ST_AsText(pt) AS point, name, ROUND(ST_Distance(@hereami, pt), 2) AS distance
  FROM poi
 WHERE ST_Distance(@hereami, pt) < 4.5
   AND type = 'Shop'
 ORDER BY distance ASC
;
+----+------------+--------+----------+
| id | point      | name   | distance |
+----+------------+--------+----------+
|  3 | POINT(5 4) | Shop 2 |     4.00 |
+----+------------+--------+----------+
1 row in set (0.37 sec)

The query execution plan looks like this:

id: 1
  select_type: SIMPLE
        table: poi
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 650361
     filtered: 10.00
        Extra: Using where; Using filesort

So no use of the spatial index yet. 🙁

Reading the MySQL documentation Using Spatial Indexes provides some more information:

The optimizer investigates whether available spatial indexes can be involved in the search for queries that use a function such as MBRContains() or MBRWithin() in the WHERE clause.

So it looks like the optimizer CAN evaluate function covered fields in this specific case. But not with the function ST_Distance I have chosen.

So my WHERE clause must look like: "Give me all points within a polygon spanned 4.5 units around my position..."

I did not find any such function in the short run. So I created a hexagon which is not too far from a circle...

With this hexagon I tried again:

SET @hereami = POINT(9,4);
SET @hexagon = 'POLYGON((9 8.5, 12.897 6.25, 12.897 1.75, 9 -0.5, 5.103 1.75, 5.103 6.25, 9 8.5))';

SELECT id, ST_AsText(pt) AS point, name, ROUND(ST_Distance(@hereami, pt), 2) AS distance
  FROM poi
 WHERE MBRContains(ST_GeomFromText(@hexagon), pt)
   AND ST_Distance(@hereami, pt) < 4.5
   AND type = 'Shop'
 ORDER BY distance ASC
;
Empty set (0.03 sec)

And tadaaah: Damned fast, but the result is not the same! 🙁 When you look at the graph above it is obvious why: The missing shop is 0.103 units outside of our hexagon search range but within our circle range. So an octagon would have been the better approach...

At least the index is considered now! 🙂

id: 1
  select_type: SIMPLE
        table: poi
   partitions: NULL
         type: range
possible_keys: pt
          key: pt
      key_len: 34
          ref: NULL
         rows: 31356
     filtered: 10.00
        Extra: Using where; Using filesort

For specifying a an "outer" hexagon I was too lazy. So I was specifying a square:

SET @hereami = POINT(9,4);
SET @square = 'POLYGON((4.5 8.5, 13.5 8.5, 13.5 -0.5, 4.5 -0.5, 4.5 8.5))';

SELECT id, ST_AsText(pt) AS point, name, ROUND(ST_Distance(@hereami, pt), 2) AS distance
  FROM poi
 WHERE MBRContains(ST_GeomFromText(@square), pt)
   AND ST_Distance(@hereami, pt) < 4.5
   AND type = 'Shop'
 ORDER BY distance ASC
;
+----+------------+--------+----------+
| id | point      | name   | distance |
+----+------------+--------+----------+
|  3 | POINT(5 4) | Shop 2 |     4.00 |
+----+------------+--------+----------+
1 row in set (0.02 sec)

So, my shop is in the result again now. And even a bit faster!

Now I wanted to find out if this results are any faster than the conventional method with an index on (x) and (y) or (x, y):

SELECT id, ST_AsText(pt) AS point, name, ROUND(ST_Distance(@hereami, pt), 2) AS distance
  FROM poi
 WHERE x >=  4.5 AND x = -0.5 AND y <= 1="" 8.5="" and="" st_distance(@hereami,="" pt)="" <="" 4.5="" type="Shop" order="" by="" distance="" asc="" ;="" row="" in="" set="" (0.15="" sec) 
 

Here the optimizer chooses the index on x. But I think he could do better. So I forced to optimizer to use the index on (x, y):

SELECT id, ST_AsText(pt) AS point, name, ROUND(ST_Distance(@hereami, pt), 2) AS distance
  FROM poi FORCE INDEX (xy)
 WHERE x >=  4.5 AND x = -0.5 AND y <= 1="" 8.5="" and="" st_distance(@hereami,="" pt)="" <="" 4.5="" type="Shop" order="" by="" distance="" asc="" ;="" row="" in="" set="" (0.03="" sec)="" id:="" select_type:="" simple="" table:="" poi="" partitions:="" null="" type:="" range="" possible_keys:="" xy="" key:="" xy
      key_len: 10
          ref: NULL
         rows: 115592
     filtered: 1.11
        Extra: Using index condition; Using where; Using filesort

Same performance than with the spatial index. So it looks like for this simple task with my data distribution conventional methods do well enough.

No I wanted to try a polygon which comes as close as possible to a circle. This I solved with a MySQL stored function which returns a polygon:/p>

DROP FUNCTION polygon_circle;

delimiter //

CREATE FUNCTION polygon_circle(pX DOUBLE, pY DOUBLE, pDiameter DOUBLE, pPoints SMALLINT UNSIGNED)
-- RETURNS VARCHAR(4096) DETERMINISTIC
RETURNS POLYGON DETERMINISTIC
BEGIN

  DECLARE i SMALLINT UNSIGNED DEFAULT 0;
  DECLARE vSteps SMALLINT UNSIGNED;
  DECLARE vPolygon VARCHAR(4096) DEFAULT '';

  -- Input validation

  IF pPoints  360 THEN
    RETURN NULL;
  END IF;
  IF pPoints > 90 THEN
    RETURN NULL;
  END IF;
  if (360 % pPoints) != 0 THEN
    RETURN NULL;
  END IF;

  -- Start

  SET vSteps = 360 / pPoints;

  WHILE i < 360 DO
    SET vPolygon = CONCAT(vPolygon, (pX + (SIN(i * 2 * PI() / 360) * pDiameter)), ' ', (pY + (COS(i * 2 * PI() / 360) * pDiameter)), ', ');
    SET i = i + vSteps;
  END WHILE;

  -- Add first point again
  SET vPolygon = CONCAT("POLYGON((", vPolygon, (pX + (SIN(0 * 2 * PI() / 360) * pDiameter)), " ",  (pY + (COS(0 * 2 * PI() / 360) * pDiameter)), "))");

  -- RETURN vPolygon;
  RETURN ST_GeomFromText(vPolygon);
END;
//

delimiter ;

SELECT ST_AsText(polygon_circle(9, 4, 4.5, 6));
-- SELECT polygon_circle(9, 4, 4.5, 8);

Then calling the query in the same way:

SET @hereami = POINT(9,4);
SELECT id, ST_AsText(pt) AS point, name, ROUND(ST_Distance(@hereami, pt), 2) AS distance
  FROM poi
 WHERE MBRContains(polygon_circle(9, 4, 4.5, 90), pt)
   AND ST_Distance(@hereami, pt) < 4.5
   AND type = 'Shop'
 ORDER BY distance ASC
;
+----+------------+--------+----------+
| id | point      | name   | distance |
+----+------------+--------+----------+
|  3 | POINT(5 4) | Shop 2 |     4.00 |
+----+------------+--------+----------+
1 row in set (0.03 sec)

This seems not to have any significant negative impact on performance.

Results

Test#rowsoperationlatency
Total655360FTS1300 ms
Spatial exact Circle4128FTS520 ms
Spatial inner Hexagon3916range (pt)20 ms
Spatial outer Square4128range (pt)30 ms
Conventional outer Square on (x)4128range (x) or (y)150 ms
Conventional outer Square on (xy)4128range (x,y)30 ms
Spatial good Polygon4128range (pt)30 ms
Shinguz's blog

责编内容by:Shinguz's blog阅读原文】。感谢您的支持!

您可能感兴趣的

使用阿里云主机离线部署CDH步骤详解 一、Linux文件系统准备 1. 拍摄快照 登录阿里云控制台,拍摄快照,注意有几个关键点尽量拍摄快照,系统初始状态、CM环境准备完成、CM安装完成、CDH安装完成。 2. 挂载设备 三个主机都执行。 创建挂载目录 $mkdir /data ...
40 million tables in MySQL 8.0 with ZFS In my previous blog post about millions of table in MySQL 8 , I was able to create one million tables and test the performance of it. My next challe...
Serverless Computing: AWS Lambda Function Invocati... Today the journey of invoking a AWS Lambda function and storing its result into a table is completed. Some more issues had to be overcome. Recap T...
如何从MySQL全备文件中恢复单个库或者单个表... 在MySQL dba的日常实际工作中,一个实例下有多个库,而我们常见的备份就是全库备份。那么问题就来了,如果需要恢复单个库或者单个表,怎么办了,网上有很多人都有多种方法,今天,我自己结合众多资料,将实践记录下来,以便供参考。 基本情况介绍: MySQL版本:mysql-5.5.36.tar....
从MySQL Bug#67718浅谈B+树索引的分裂优化 问题背景 今天,看到Twitter的DBA团队发布了其最新的MySQL分支: Changes in Twitter MySQL 5.5.28.t9 ,此分支最重要的一个改进,就是修复了MySQL 的Bug #67718: InnoDB drast...