I wanted to get rid of long line segments in my data. Since I don’t know any software tools off the top of my head that would do the job, I decided to code it myself. I have already stored everything in a spatially-enabled PostgreSQL table, and to be honest, recently I am more interested in manipulating data at the database level to save some time. So, instead of writing a python script and looping through a cursor, I created a function which I am calling in SQL queries. The picture below gives an overview of what the following function does. Basically, it breaks input geometries at line segments that exceed a certain threshold in length. Red means “no bueno”, green means “yaay!”.
Like most of the codes in academia, it’s not fancy at all. At least it does the job, so feel free to reuse it. It takes two arguments, a geometry and the threshold in meters. Geometry must be LineString. Some of the input lines will remain the same but those that contain long segments can be MultiLineStrings. Some software and data format may have trouble storing and visualizing mixed geometries in the same column, so make sure you resolve that after running the query.
-- Function that breaks lines at long segments -- Works with simple LineStrings, outputs either LineStrings or MultiLineStrings CREATE OR REPLACE FUNCTION break_lines(line geometry, distance integer) RETURNS geometry AS $BODY$ DECLARE segment geometry; line_1 geometry; started boolean; multi boolean; multi_started boolean; multi_line geometry; length float; i int; count int; BEGIN count := ST_NPoints(line); IF count IS NULL THEN RETURN NULL; END IF; IF count < 2 THEN RETURN NULL; END IF; started := false; multi := false; multi_started := false; FOR i IN 1..count LOOP segment := ST_MakeLine(ST_PointN(line, i), ST_PointN(line, i + 1)); length := ST_Length(segment, false); IF length IS NULL THEN CONTINUE; END IF; IF length < distance THEN IF NOT started THEN line_1 := ST_MakeLine(ST_PointN(line, i), ST_PointN(line, i + 1)); started := true; ELSE line_1 := ST_MakeLine(line_1, ST_PointN(line, i + 1)); END IF; ELSE RAISE NOTICE 'Long segment!'; RAISE NOTICE 'Length: %', length; IF line_1 IS NULL THEN CONTINUE; END IF; IF NOT multi_started THEN multi_line := line_1; multi_started := true; started := false; multi := true; ELSE multi_line := ST_Union(multi_line, line_1); multi := true; started := false; END IF; END IF; END LOOP; IF NOT multi THEN RETURN line_1; ELSIF multi AND line_1 IS NOT NULL THEN RETURN ST_Union(multi_line, line_1); ELSE RETURN multi_line; END IF; END $BODY$ LANGUAGE 'plpgsql'; CREATE TABLE split_lines_1000m AS SELECT row_number() OVER (ORDER BY data.geom ASC) as row_number, break_lines(data.geom, 1000) as geom FROM (SELECT * FROM mytable) AS data
Based on my tests made with a relatively big dataset (60.000+ lines) on a 3-year-old notebook, it executes within a reasonable amount of time. I found it useful while working with GPS trajectory like data, since signal loss can result some issues. Please let me know in case you know about a tool that does the same. It would be great to compare performance and do some more tests.