Breaking lines at long segments

I wanted to get rid of long line segments in my data. Since I don’t know any software tools off the top of my head that would do the job, I decided to code it myself. I have already stored everything in a spatially-enabled PostgreSQL table, and to be honest, recently I am more interested in manipulating data at the database level to save some time. So, instead of writing a python script and looping through a cursor, I created a function which I am calling in SQL queries. The picture below gives an overview of what the following function does. Basically, it breaks input geometries at line segments that exceed a certain threshold in length. Red means “no bueno”, green means “yaay!”.

1

Like most of the codes in academia, it’s not fancy at all. At least it does the job, so feel free to reuse it. It takes two arguments, a geometry and the threshold in meters. Geometry must be LineString. Some of the input lines will remain the same but those that contain long segments can be MultiLineStrings. Some software and data format may have trouble storing and visualizing mixed geometries in the same column, so make sure you resolve that after running the query.


-- Function that breaks lines at long segments
-- Works with simple LineStrings, outputs either LineStrings or MultiLineStrings
CREATE OR REPLACE FUNCTION break_lines(line geometry, distance integer) RETURNS geometry AS
$BODY$
DECLARE
segment geometry;
line_1 geometry;
started boolean;
multi boolean;
multi_started boolean;
multi_line geometry;
length float;
i int;
count int;

BEGIN
count := ST_NPoints(line);
IF count IS NULL THEN
RETURN NULL;
END IF;
IF count < 2 THEN
RETURN NULL;
END IF;
started := false;
multi := false;
multi_started := false;
FOR i IN 1..count LOOP
segment := ST_MakeLine(ST_PointN(line, i), ST_PointN(line, i + 1));
length := ST_Length(segment, false);
IF length IS NULL THEN
CONTINUE;
END IF;
IF length < distance THEN
IF NOT started THEN
line_1 := ST_MakeLine(ST_PointN(line, i), ST_PointN(line, i + 1));
started := true;
ELSE
line_1 := ST_MakeLine(line_1, ST_PointN(line, i + 1));
END IF;
ELSE
RAISE NOTICE 'Long segment!';
RAISE NOTICE 'Length: %', length;
IF line_1 IS NULL THEN
CONTINUE;
END IF;
IF NOT multi_started THEN
multi_line := line_1;
multi_started := true;
started := false;
multi := true;

ELSE
multi_line := ST_Union(multi_line, line_1);
multi := true;
started := false;
END IF;
END IF;
END LOOP;
IF NOT multi THEN
RETURN line_1;
ELSIF multi AND line_1 IS NOT NULL THEN
RETURN ST_Union(multi_line, line_1);
ELSE
RETURN multi_line;
END IF;
END
$BODY$
LANGUAGE 'plpgsql';

CREATE TABLE split_lines_1000m AS
SELECT row_number() OVER (ORDER BY data.geom ASC) as row_number, break_lines(data.geom, 1000) as geom
FROM (SELECT * FROM mytable) AS data

Based on my tests made with a relatively big dataset (60.000+ lines) on a 3-year-old notebook, it executes within a reasonable amount of time. I found it useful while working with GPS trajectory like data, since signal loss can result some issues. Please let me know in case you know about a tool that does the same. It would be great to compare performance and do some more tests.

subset

Leave a Reply

Your email address will not be published. Required fields are marked *