Shinguz: MySQL spatial functionality – points of interest around me

存储架构 2016-06-01 阅读原文

This week I was preparing the exercises for our MySQL/MariaDB for Beginners training . One of the exercises of the training is about MySQL spatial (GIS) features . I always tell customers: "With these features you can answer questions like: Give me all points of interest around me!"

Now I wanted to try out how it really works and if it is that easy at all...

To get myself an idea of what I want to do I did a little sketch first:

My position
Shops
Restaurants
Cafes

To do this I needed a table and some data:

CREATE TABLE poi (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT
, name VARCHAR(40)
, type VARCHAR(20)
, sub_type VARCHAR(20)
, pt POINT NOT NULL
, PRIMARY KEY (id)
, SPATIAL INDEX(pt)
) ENGINE=InnoDB;

INSERT INTO poi (name, type, sub_type, pt) VALUES
  ('Shop 1', 'Shop', 'Cloth', Point(2,2))
, ('Cafe 1', 'Cafe', '',  Point(11,2))
, ('Shop 2', 'Shop', 'Cloth',  Point(5,4))
, ('Restaurant 1', 'Restaurant', 'Portugies',  Point(8,7))
, ('Cafe 2', 'Cafe', '',  Point(3,9))
, ('Shop 3', 'Shop', 'Hardware',  Point(11,9))
;

This looks as follows:

SELECT id, CONCAT(ST_X(pt), '/', ST_Y(pt)) AS "X/Y", name, type, sub_type
  FROM poi;
+----+-----------+--------------+------------+-----------+
| id | X/Y       | name         | type       | sub_type  |
+----+-----------+--------------+------------+-----------+
|  1 | 2/2       | Shop 1       | Shop       | Cloth     |
|  2 | 11/2      | Cafe 1       | Cafe       |           |
|  3 | 5/4       | Shop 2       | Shop       | Cloth     |
|  4 | 8/7       | Restaurant 1 | Restaurant | Portugies |
|  5 | 3/9       | Cafe 2       | Cafe       |           |
|  6 | 11/9      | Shop 3       | Shop       | Hardware  |
+----+-----------+--------------+------------+-----------+

Now the question: "Give me all shops in a distance of 4.5 units around me":

SET @hereami = POINT(9,4);

SELECT id, ST_AsText(pt) AS point, name, ROUND(ST_Distance(@hereami, pt), 2) AS distance
  FROM poi
 WHERE ST_Distance(@hereami, pt) < 4.5
   AND type = 'Shop'
 ORDER BY distance ASC
;
+----+------------+--------+----------+
| id | point      | name   | distance |
+----+------------+--------+----------+
|  3 | POINT(5 4) | Shop 2 |     4.00 |
+----+------------+--------+----------+
1 row in set (0.37 sec)

The query execution plan looks like this:

id: 1
  select_type: SIMPLE
        table: poi
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 650361
     filtered: 10.00
        Extra: Using where; Using filesort

So no use of the spatial index yet. 🙁

Reading the MySQL documentation Using Spatial Indexes provides some more information:

The optimizer investigates whether available spatial indexes can be involved in the search for queries that use a function such as MBRContains() or MBRWithin() in the WHERE clause.

So it looks like the optimizer CAN evaluate function covered fields in this specific case. But not with the function ST_Distance I have chosen.

So my WHERE clause must look like: "Give me all points within a polygon spanned 4.5 units around my position..."

I did not find any such function in the short run. So I created a hexagon which is not too far from a circle...

With this hexagon I tried again:

SET @hereami = POINT(9,4);
SET @hexagon = 'POLYGON((9 8.5, 12.897 6.25, 12.897 1.75, 9 -0.5, 5.103 1.75, 5.103 6.25, 9 8.5))';

SELECT id, ST_AsText(pt) AS point, name, ROUND(ST_Distance(@hereami, pt), 2) AS distance
  FROM poi
 WHERE MBRContains(ST_GeomFromText(@hexagon), pt)
   AND ST_Distance(@hereami, pt) < 4.5
   AND type = 'Shop'
 ORDER BY distance ASC
;
Empty set (0.03 sec)

And tadaaah: Damned fast, but the result is not the same! 🙁 When you look at the graph above it is obvious why: The missing shop is 0.103 units outside of our hexagon search range but within our circle range. So an octagon would have been the better approach...

At least the index is considered now! 🙂

id: 1
  select_type: SIMPLE
        table: poi
   partitions: NULL
         type: range
possible_keys: pt
          key: pt
      key_len: 34
          ref: NULL
         rows: 31356
     filtered: 10.00
        Extra: Using where; Using filesort

For specifying a an "outer" hexagon I was too lazy. So I was specifying a square:

SET @hereami = POINT(9,4);
SET @square = 'POLYGON((4.5 8.5, 13.5 8.5, 13.5 -0.5, 4.5 -0.5, 4.5 8.5))';

SELECT id, ST_AsText(pt) AS point, name, ROUND(ST_Distance(@hereami, pt), 2) AS distance
  FROM poi
 WHERE MBRContains(ST_GeomFromText(@square), pt)
   AND ST_Distance(@hereami, pt) < 4.5
   AND type = 'Shop'
 ORDER BY distance ASC
;
+----+------------+--------+----------+
| id | point      | name   | distance |
+----+------------+--------+----------+
|  3 | POINT(5 4) | Shop 2 |     4.00 |
+----+------------+--------+----------+
1 row in set (0.02 sec)

So, my shop is in the result again now. And even a bit faster!

Now I wanted to find out if this results are any faster than the conventional method with an index on (x) and (y) or (x, y):

SELECT id, ST_AsText(pt) AS point, name, ROUND(ST_Distance(@hereami, pt), 2) AS distance
  FROM poi
 WHERE x >=  4.5 AND x = -0.5 AND y <= 1="" 8.5="" and="" st_distance(@hereami,="" pt)="" <="" 4.5="" type="Shop" order="" by="" distance="" asc="" ;="" row="" in="" set="" (0.15="" sec) 
 

Here the optimizer chooses the index on x. But I think he could do better. So I forced to optimizer to use the index on (x, y):

SELECT id, ST_AsText(pt) AS point, name, ROUND(ST_Distance(@hereami, pt), 2) AS distance
  FROM poi FORCE INDEX (xy)
 WHERE x >=  4.5 AND x = -0.5 AND y <= 1="" 8.5="" and="" st_distance(@hereami,="" pt)="" <="" 4.5="" type="Shop" order="" by="" distance="" asc="" ;="" row="" in="" set="" (0.03="" sec)="" id:="" select_type:="" simple="" table:="" poi="" partitions:="" null="" type:="" range="" possible_keys:="" xy="" key:="" xy
      key_len: 10
          ref: NULL
         rows: 115592
     filtered: 1.11
        Extra: Using index condition; Using where; Using filesort

Same performance than with the spatial index. So it looks like for this simple task with my data distribution conventional methods do well enough.

No I wanted to try a polygon which comes as close as possible to a circle. This I solved with a MySQL stored function which returns a polygon:/p>

DROP FUNCTION polygon_circle;

delimiter //

CREATE FUNCTION polygon_circle(pX DOUBLE, pY DOUBLE, pDiameter DOUBLE, pPoints SMALLINT UNSIGNED)
-- RETURNS VARCHAR(4096) DETERMINISTIC
RETURNS POLYGON DETERMINISTIC
BEGIN

  DECLARE i SMALLINT UNSIGNED DEFAULT 0;
  DECLARE vSteps SMALLINT UNSIGNED;
  DECLARE vPolygon VARCHAR(4096) DEFAULT '';

  -- Input validation

  IF pPoints  360 THEN
    RETURN NULL;
  END IF;
  IF pPoints > 90 THEN
    RETURN NULL;
  END IF;
  if (360 % pPoints) != 0 THEN
    RETURN NULL;
  END IF;

  -- Start

  SET vSteps = 360 / pPoints;

  WHILE i < 360 DO
    SET vPolygon = CONCAT(vPolygon, (pX + (SIN(i * 2 * PI() / 360) * pDiameter)), ' ', (pY + (COS(i * 2 * PI() / 360) * pDiameter)), ', ');
    SET i = i + vSteps;
  END WHILE;

  -- Add first point again
  SET vPolygon = CONCAT("POLYGON((", vPolygon, (pX + (SIN(0 * 2 * PI() / 360) * pDiameter)), " ",  (pY + (COS(0 * 2 * PI() / 360) * pDiameter)), "))");

  -- RETURN vPolygon;
  RETURN ST_GeomFromText(vPolygon);
END;
//

delimiter ;

SELECT ST_AsText(polygon_circle(9, 4, 4.5, 6));
-- SELECT polygon_circle(9, 4, 4.5, 8);

Then calling the query in the same way:

SET @hereami = POINT(9,4);
SELECT id, ST_AsText(pt) AS point, name, ROUND(ST_Distance(@hereami, pt), 2) AS distance
  FROM poi
 WHERE MBRContains(polygon_circle(9, 4, 4.5, 90), pt)
   AND ST_Distance(@hereami, pt) < 4.5
   AND type = 'Shop'
 ORDER BY distance ASC
;
+----+------------+--------+----------+
| id | point      | name   | distance |
+----+------------+--------+----------+
|  3 | POINT(5 4) | Shop 2 |     4.00 |
+----+------------+--------+----------+
1 row in set (0.03 sec)

This seems not to have any significant negative impact on performance.

Results

Test#rowsoperationlatency
Total655360FTS1300 ms
Spatial exact Circle4128FTS520 ms
Spatial inner Hexagon3916range (pt)20 ms
Spatial outer Square4128range (pt)30 ms
Conventional outer Square on (x)4128range (x) or (y)150 ms
Conventional outer Square on (xy)4128range (x,y)30 ms
Spatial good Polygon4128range (pt)30 ms
Shinguz's blog

责编内容by:Shinguz's blog阅读原文】。感谢您的支持!

您可能感兴趣的

启动mysql失败 背景:ubuntu下运行的django项目,前一天还跑的好好地,今天突然报错,检查后发现mysql启动失败 Starting mysql (via systemctl): mysql.serviceJob for mysql.servi...
青铜到王者,看看你的MySQL数据库是什么段位,如何提升?... 新的一周,老张(superZS)与大家见面,我们又要面临快速的生活节奏而令人厌恶的工作!现在大多数人选择放松自己的方式就是玩游戏,最为突出的可能就要属手游"王者荣耀"。 据说这款游戏上到70旬老者,下至小学生都玩,老张我也玩。段位...
sqlserver、Mysql、Oracle三种数据库的优缺点总结... 一、sqlserver 优点:易用性、适合分布式组织的可伸缩性、用于决策支持的数据仓库功能、与许多其他服务器软件紧密关联的集成性、良好的性价比等; 为数据管理与分析带来了灵活性,允许单位在快速变化的环境中从容响...
Pinterest架构解析 Pinterest是全球最大的图片社交网站,创建于2010年,当前服务全球数亿活跃用户,估值已经超过100亿美元。其快速增长的流量曾经给工程团队带来极大的挑战。本文跟据QCon技术大会上的分享,以及Pinterest技术博客的内容...
Write-heavy workloads with MyRocks MyRocks is based on RocksDB and RocksDB is write-optimized, so why don't write-heavy workloads always run faster on MyRo...