The ESRI Shapefile Model

The ESRI Shapefile Model

Introduced in the early 1990s, the shapefile format is one of the most common GIS vector data formats compatible with the majority of software platforms. It was designed as a compromise based on the most widely used database format of the time by indexing it with a feature file. It is used to create and disseminate vector data such as points, polylines, polygons, and associated attributes. Although a shapefile is functionally one element, it is composed of at least three files with the same name but with a different extension:

  • *.shp: The binary file containing the geometry of the features. Only one type of geometry can be stored per shapefile. This information is stored using a Cartesian reference system compatible with various spatial referencing models, including longitudes and latitudes. Three-dimensional data can also be stored, such as altitude information related to each feature component. The shapefile is limited to 2 gigabytes and cannot have more than 4,000 point features or 2,000 polyline or polygon features.
  • *.shx: Index file; a positional index that links features with the corresponding record in the attribute table.
  • *.dbf: Attribute table where each feature corresponds to a record. This information is stored in dBase IV format, which is a legacy format with several limitations. There cannot be more than 255 fields in the database, and each field’s name is limited to 10 characters.

There are other associated files, with the *.prj extension the most common, since it stores the georeferencing system related to the features and the *.xml extension, which stores metadata. Files must be stored in the same folder. Otherwise, they will not be accessible.

Topological information cannot be stored in a shapefile, but the GIS can create a topology based on the information contained in shapefiles. Still, shapefiles are less computing-intensive to display since there is no topology, which was an important factor when processing speed was slower but has become less relevant.