Table Of Contents

Previous topic

Regionalizing with clusterPy

Next topic

Endogenous number of regions (Defined into algorithm)

This Page

Exogenous number of regions (defined by the user)

Arisel [Duque_Church2004]

clusterpy0_9_9.clusterpy.core.toolboxes.cluster.arisel.execArisel(y, w, pRegions, inits=3, initialSolution=[], convTabu=0, tabuLength=10)

Automatic Rationalization with Initial Seed Location

ARiSeL, proposed by [Duque_Church2004] , aggregates N areas into P spatially contiguous regions while minimizing intra-regional heterogeneity (measured as the within-cluster sum of squares from each area to the attribute centroid of its cluster). This algorithm is a modification of Openshaw’s AZP-tabu [Openshaw_Rao1995]. In ARISeL the construction of a initial feasible solution is repeated several times (inits) before running Tabu Search algorithm [Glover1977].

Duque and Church argue that:

  • constructing and initial feasible solution is computationally less expensive than performing local search.
  • local search by moving bordering areas between region do not allow an extensive search in the solution space and it is computationally expensive.

Based on those two ideas, the authors propose to generate as many different initial feasible solutions and run Tabu search on the best initial solution obtained so far.

The initial solution follows a “growing regions” strategy. It starts with a initial set of seeds (as many seed as regions) selected using the K-means++ algorithm. From those seeds, other neighbouring areas are assigned to its closest (in attribute space) growing region. This strategy has proven better results.

Layer.cluster('arisel',vars,regions,<wType>,<std>,<inits>,<initialSolution>,<convTabu>,<tabuLength>,<dissolve>,<dataOperations>)
Parameters:
  • vars (list) – Area attribute(s) (e.g. [‘SAR1’,’SAR2’])
  • regions (integer) – Number of regions
  • wType (string) – Type of first-order contiguity-based spatial matrix: ‘rook’ or ‘queen’. Default value wType = ‘rook’.
  • std (binary) – If = 1, then the variables will be standardized.
  • inits (integer. Default value inits = 5.) – number of initial feasible solutions to be constructed before applying Tabu Search.
  • initialSolution (list) – List with a initial solution vector. It is useful when the user wants a solution that is not very different from a preexisting solution (e.g. municipalities,districts, etc.). Note that the number of regions will be the same as the number of regions in the initial feasible solution (regardless the value you assign to parameter “regions”). IMPORTANT: make sure you are entering a feasible solution and according to the W matrix you selected, otherwise the algorithm will not converge.
  • convTabu (integer) – Stop the search after convTabu nonimproving moves (nonimproving moves are those moves that do not improve the current solution. Note that “improving moves” are different to “aspirational moves”). If convTabu=0 the algorithm will stop after Int(M/N) nonimproving moves. Default value convTabu = 0.
  • tabuLength (integer) – Number of times a reverse move is prohibited. Default value tabuLength = 10.
  • dissolve (binary) – If = 1, then you will get a “child” instance of the layer that contains the new regions. Default value dissolve = 0. Note:. Each child layer is saved in the attribute layer.results. The first algorithm that you run with dissolve=1 will have a child layer in layer.results[0]; the second algorithm that you run with dissolve=1 will be in layer.results[1], and so on. You can export a child as a shapefile with layer.result[<1,2,3..>].exportArcData(‘filename’)
  • dataOperations (dictionary) – Dictionary which maps a variable to a list of operations to run on it. The dissolved layer will contains in it’s data all the variables specified in this dictionary. Be sure to check the input layer’s fieldNames before use this utility.

The dictionary structure must be as showed bellow.

>>> X = {}
>>> X[variableName1] = [function1, function2,....]
>>> X[variableName2] = [function1, function2,....]

Where functions are strings wich represents the name of the functions to be used on the given variableName. Functions could be,’sum’,’mean’,’min’,’max’,’meanDesv’,’stdDesv’,’med’, ‘mode’,’range’,’first’,’last’,’numberOfAreas. By deffault just ID variable is added to the dissolved map.

AZP [Openshaw_Rao1995]

clusterpy0_9_9.clusterpy.core.toolboxes.cluster.azp.execAZP(y, w, pRegions, initialSolution=[])

Automatic Zoning Procedure (AZP)

AZP is a mildly steepest descent algorithm that aggregates N zones (areas) into M regions. “The M output regions should be formed of internally connected, contiguous, zones.” ([Openshaw_Rao1995] pp 428).

In [Openshaw_Rao1995] the objective function is not defined because AZP can be applied to any function, F(Z). “F(Z) can be any function defined on data for the M regions in Z, and Z is the allocation of each of N zones to one of M regions such that each zone is assigned to only one region” ([Openshaw_Rao1995] pp 428).” In clusterPy we Minimize F(Z), where Z is the within-cluster sum of squares from each area to the attribute centroid of its cluster.

NOTE: The original algorithm proposes to start from a random initial feasible solution. Previous computational experience showed us that this approach leads to poor quality solutions. In clusterPy we started from an initial solution that starts with a initial set of seeds (as many seed as regions) selected using the K-means++ algorithm. From those seeds, other neighbouring areas are assigned to its closest (in attribute space) growing region. This strategy has proven better results.

Layer.cluster('azp',vars,regions,<wType>,<std>,<initialSolution>,<dissolve>,<dataOperations>)
Parameters:
  • vars (list) – Area attribute(s) (e.g. [‘SAR1’,’SAR2’])
  • regions (integer) – Number of regions
  • wType (string) – Type of first-order contiguity-based spatial matrix: ‘rook’ or ‘queen’. Default value wType = ‘rook’.
  • std (binary) – If = 1, then the variables will be standardized.
  • initialSolution (list) – List with a initial solution vector. It is useful when the user wants a solution that is not very different from a preexisting solution (e.g. municipalities,districts, etc.). Note that the number of regions will be the same as the number of regions in the initial feasible solution (regardless the value you assign to parameter “regions”). IMPORTANT: make sure you are entering a feasible solution and according to the W matrix you selected, otherwise the algorithm will not converge.
  • dissolve (binary) – If = 1, then you will get a “child” instance of the layer that contains the new regions. Default value dissolve = 0. Note: Each child layer is saved in the attribute layer.results. The first algorithm that you run with dissolve=1 will have a child layer in layer.results[0]; the second algorithm that you run with dissolve=1 will be in layer.results[1], and so on. You can export a child as a shapefile with layer.result[<1,2,3..>].exportArcData(‘filename’)
  • dataOperations (dictionary) – Dictionary which maps a variable to a list of operations to run on it. The dissolved layer will contains in it’s data all the variables specified in this dictionary. Be sure to check the input layer’s fieldNames before use this utility.

The dictionary structure must be as showed bellow.

>>> X = {}
>>> X[variableName1] = [function1, function2,....]
>>> X[variableName2] = [function1, function2,....]

Where functions are strings wich represents the name of the functions to be used on the given variableName. Functions could be,’sum’,’mean’,’min’,’max’,’meanDesv’,’stdDesv’,’med’, ‘mode’,’range’,’first’,’last’,’numberOfAreas. By deffault just ID variable is added to the dissolved map.

AZP Simulated anealing [Openshaw_Rao1995]

clusterpy0_9_9.clusterpy.core.toolboxes.cluster.azpSa.execAZPSA(y, w, pRegions, initialSolution=[], maxit=1)

Simulated Annealing variant of Automatic Zoning Procedure (AZP-SA)

AZP-SA aggregates N zones (areas) into M regions. “The M output regions should be formed of internally connected, contiguous, zones.” ([Openshaw_Rao1995] pp 428).

AZP-SA is a variant of the AZP algorithm that incorporates a seach process, called Simulated Annealing algorithm [Kirkpatrick_Gelatt_Vecchi1983]. Simulated annealing algorithm “permits moves which result in a worse value of the objective function but with a probability that diminishes gradually, through iteration time” ([Openshaw_Rao1995] pp 431).

In Openshaw and Rao (1995) the objective function is not defined because AZP-Tabu can be applied to any function, F(Z). “F(Z) can be any function defined on data for the M regions in Z, and Z is the allocation of each of N zones to one of M regions such that each zone is assigned to only one region” ([Openshaw_Rao1995] pp 428).” In clusterPy we Minimize F(Z), where Z is the within-cluster sum of squares from each area to the attribute centroid of its cluster.

In order to make the cooling schedule robust the units of measure of the objective function, we set the Boltzmann’s equation as: R(0,1) < exp((-(Candidate Solution - Current Solution) / Current Solution)/T(k)). The cooling schedule is T(k) = 0.85 T(k-1) ([Openshaw_Rao1995] pp 431), with an initial temperature T(0)=1.

NOTE: The original algorithm proposes to start from a random initial feasible solution. Previous computational experience showed us that this approach leads to poor quality solutions. In clusterPy we started from an initial solution that starts with a initial set of seeds (as many seed as regions) selected using the K-means++ algorithm. From those seeds, other neighbouring areas are assigned to its closest (in attribute space) growing region. This strategy has proven better results.

layer.cluster('azpSa',vars,regions,<wType>,<std>,<initialSolution>,<maxit>,<dissolve>,<dataOperations>)
Parameters:
  • vars (list) – Area attribute(s) (e.g. [‘SAR1’,’SAR2’])
  • regions (integer) – Number of regions
  • wType (string) – Type of first-order contiguity-based spatial matrix: ‘rook’ or ‘queen’. Default value wType = ‘rook’.
  • std (binary) – If = 1, then the variables will be standardized.
  • initialSolution (list) – List with a initial solution vector. It is useful when the user wants a solution that is not very different from a preexisting solution (e.g. municipalities,districts, etc.). Note that the number of regions will be the same as the number of regions in the initial feasible solution (regardless the value you assign to parameter “regions”). IMPORTANT: make sure you are entering a feasible solution and according to the W matrix you selected, otherwise the algorithm will not converge.
  • maxit (integer) – For a given temperature, perform SA maxit times (see Openshaw and Rao (1995) pp 431, Step b). Default value maxit = 1. NOTE: the parameter Ik, in Step d was fixed at 3.
  • dissolve (binary) – If = 1, then you will get a “child” instance of the layer that contains the new regions. Default value = 0. Note:. Each child layer is saved in the attribute layer.results. The first algorithm that you run with dissolve=1 will have a child layer in layer.results[0]; the second algorithm that you run with dissolve=1 will be in layer.results[1], and so on. You can export a child as a shapefile with layer.result[<1,2,3..>].exportArcData(‘filename’)
  • dataOperations (dictionary) – Dictionary which maps a variable to a list of operations to run on it. The dissolved layer will contains in it’s data all the variables specified in this dictionary. Be sure to check the input layer’s fieldNames before use this utility.

The dictionary structure must be as showed bellow.

>>> X = {}
>>> X[variableName1] = [function1, function2,....]
>>> X[variableName2] = [function1, function2,....]

Where functions are strings wich represents the name of the functions to be used on the given variableName. Functions could be,’sum’,’mean’,’min’,’max’,’meanDesv’,’stdDesv’,’med’, ‘mode’,’range’,’first’,’last’,’numberOfAreas. By deffault just ID variable is added to the dissolved map.

AZP Tabu: [Openshaw_Rao1995]

clusterpy0_9_9.clusterpy.core.toolboxes.cluster.azpTabu.execAZPTabu(y, w, pRegions, initialSolution=[], convTabu=0, tabuLength=10)

Tabu variant of Automatic Zoning Procedure (AZP-Tabu)

AZP-Tabu aggregates N zones (areas) into M regions. “The M output regions should be formed of internally connected, contiguous, zones.” ([Openshaw_Rao1995] pp 428).

AZP-Tabu is a variant of the AZP algorithm that incorporates a search process, called Tabu algorithm [Glover1977]. Tabu “allows the search process to escape from local optima whilst avoiding cyclical behaviour.” ([Openshaw_Rao1995] pp 432).

In Openshaw and Rao (1995) the objective function is not defined because AZP-Tabu can be applied to any function, F(Z). “F(Z) can be any function defined on data for the M regions in Z, and Z is the allocation of each of N zones to one of M regions such that each zone is assigned to only one region” ([Openshaw_Rao1995] pp 428).” In clusterPy we Minimize F(Z), where Z is the within-cluster sum of squares from each area to the attribute centroid of its cluster.

Openshaw and Rao (1995) do not specify a value for the two most important parameters of Tabu seach: length of the tabu list and convergence criteria. See [Duque_Anselin_Rey2010] for an evaluation of the performance of tabu search within the context of spatial clustering.

NOTE: The original algorithm proposes to start from a random initial feasible solution. Previous computational experience showed us that this approach leads to poor quality solutions. In clusterPy we started from an initial solution that starts with a initial set of seeds (as many seed as regions) selected using the K-means++ algorithm. From those seeds, other neighbouring areas are assigned to its closest (in attribute space) growing region. This strategy has proven better results.

layer.cluster('azpTabu',vars,regions,<wType>,<std>,<initialSolution>,<convTabu>,<tabuLength>,<dissolve>,<dataOperations>)
Parameters:
  • vars (list) – Area attribute(s) (e.g. [‘SAR1’,’SAR2’])
  • regions (integer) – Number of regions
  • wType (string) – Type of first-order contiguity-based spatial matrix: ‘rook’ or ‘queen’. Default value wType = ‘rook’.
  • std (binary) – If = 1, then the variables will be standardized.
  • initialSolution (list) – List with a initial solution vector. It is useful when the user wants a solution that is not very different from a preexisting solution (e.g. municipalities,districts, etc.). Note that the number of regions will be the same as the number of regions in the initial feasible solution (regardless the value you assign to parameter “regions”). IMPORTANT: make sure you are entering a feasible solution and according to the W matrix you selected, otherwise the algorithm will not converge.
  • convTabu (integer) – Stop the search after convTabu nonimproving moves (nonimproving moves are those moves that do not improve the current solution. Note that “improving moves” are different to “aspirational moves”). If convTabu=0 the algorithm will stop after max(10,len(N)/M) nonimproving moves. Default value convTabu = 0.
  • tabuLength (integer) – Number of times a reverse move is prohibited. Default value tabuLength = 10.
  • dissolve (binary) – If = 1, then you will get a “child” instance of the layer that contains the new regions. Default value = 0. Note:. Each child layer is saved in the attribute layer.results. The first algorithm that you run with dissolve=1 will have a child layer in layer.results[0]; the second algorithm that you run with dissolve=1 will be in layer.results[1], and so on. You can export a child as a shapefile with layer.result[<1,2,3..>].exportArcData(‘filename’)
  • dataOperations (dictionary) – Dictionary which maps a variable to a list of operations to run on it. The dissolved layer will contains in it’s data all the variables specified in this dictionary. Be sure to check the input layer’s fieldNames before use this utility.

The dictionary structure must be as showed bellow.

>>> X = {}
>>> X[variableName1] = [function1, function2,....]
>>> X[variableName2] = [function1, function2,....]

Where functions are strings which represents the name of the functions to be used on the given variableName. Functions could be,’sum’,’mean’,’min’,’max’,’meanDesv’,’stdDesv’,’med’, ‘mode’,’range’,’first’,’last’,’numberOfAreas. By default just ID variable is added to the dissolved map.

AZP Reactive Tabu: [Openshaw_Rao1995]

clusterpy0_9_9.clusterpy.core.toolboxes.cluster.azpRtabu.execAZPRTabu(y, w, pRegions, initialSolution=[], convTabu=0)

Reactive tabu variant of Automatic Zoning Procedure (AZP-R-Tabu)

AZP-R-Tabu aggregates N zones (areas) into M regions. “The M output regions should be formed of internally connected, contiguous, zones.” ([Openshaw_Rao1995] pp 428).

AZP-R-Tabu is a variant of the AZP algorithm that incorporates a seach process, called Reactive Tabu Search algorithm [Battiti_Tecchiolli1994]. The main difference between the reactive tabu and the tabu search, devised by [Glover1977] , is that the former does not require to define the number of times a reverse move is prohibited (tabuLength). This parameter is dynamically adjusted by the algorithm.

In [Openshaw_Rao1995] the objective function is not defined because AZP-Tabu can be applied to any function, F(Z). “F(Z) can be any function defined on data for the M regions in Z, and Z is the allocation of each of N zones to one of M regions such that each zone is assigned to only one region” ([Openshaw_Rao1995] pp 428).” In clusterPy we Minimize F(Z), where Z is the within-cluster sum of squares from each area to the attribute centroid of its cluster.

NOTE: The original algorithm proposes to start from a random initial feasible solution. Previous computational experience showed us that this approach leads to poor quality solutions. In clusterPy we started from an initial solution that starts with a initial set of seeds (as many seed as regions) selected using the K-means++ algorithm. From those seeds, other neighbouring areas are assigned to its closest (in attribute space) growing region. This strategy has proven better results.

layer.cluster('azpRTabu',vars,regions,<wType>,<std>,<initialSolution>,<convTabu>,<dissolve>,<dataOperations>)
Parameters:
  • vars (list) – Area attribute(s) (e.g. [‘SAR1’,’SAR2’])
  • regions (integer) – Number of regions
  • wType (string) – Type of first-order contiguity-based spatial matrix: ‘rook’ or ‘queen’. Default value wType = ‘rook’.
  • std (binary) – If = 1, then the variables will be standardized.
  • initialSolution (list) – List with a initial solution vector. It is useful when the user wants a solution that is not very different from a preexisting solution (e.g. municipalities,districts, etc.). Note that the number of regions will be the same as the number of regions in the initial feasible solution (regardless the value you assign to parameter “regions”). IMPORTANT: make sure you are entering a feasible solution and according to the W matrix you selected, otherwise the algorithm will not converge.
  • convTabu (integer) – Stop the search after convTabu nonimproving moves (nonimproving moves are those moves that do not improve the current solution. Note that “improving moves” are different to “aspirational moves”). If convTabu=0 the algorithm will stop after Int(M/N) nonimproving moves. Default value convTabu = 0.
  • dissolve (binary) – If = 1, then you will get a “child” instance of the layer that contains the new regions. Default value = 0. Note:. Each child layer is saved in the attribute ayer.results. The first algorithm that you run with dissolve=1 will have a child layer in layer.results[0]; the second algorithm that you run with dissolve=1 will be in layer.results[1], and so on. You can export a child as a shapefile with layer.result[<1,2,3..>].exportArcData(‘filename’)
  • dataOperations (dictionary) – Dictionary which maps a variable to a list of operations to run on it. The dissolved layer will contains in it’s data all the variables specified in this dictionary. Be sure to check the input layer’s fieldNames before use this utility.

The dictionary structure must be as showed bellow.

>>> X = {}
>>> X[variableName1] = [function1, function2,....]
>>> X[variableName2] = [function1, function2,....]

Where functions are strings wich represents the name of the functions to be used on the given variableName. Functions could be,’sum’,’mean’,’min’,’max’,’meanDesv’,’stdDesv’,’med’, ‘mode’,’range’,’first’,’last’,’numberOfAreas. By deffault just ID variable is added to the dissolved map.

Random

clusterpy0_9_9.clusterpy.core.toolboxes.cluster.random.execRandom(y, w, regions)

Generate random regions

This algorithm aggregates, at random, a set of areas into a predefined number of spatially contiguous regions.

layer.cluster('random',vars,regions,<wType>,<dissolve>,<dataOperations>)
Parameters:
  • vars (list) – Area attribute(s) (e.g. [‘SAR1’,’SAR2’])
  • regions (integer) – Number of regions
  • wType (string) – Type of first-order contiguity-based spatial matrix: ‘rook’ or ‘queen’. Default value wType = ‘rook’.
  • dissolve (binary) – If = 1, then you will get a “child” instance of the layer that contains the new regions. Default value = 0. Note:. Each child layer is saved in the attribute layer.results. The first algorithm that you run with dissolve=1 will have a child layer in layer.results[0]; the second algorithm that you run with dissolve=1 will be in layer.results[1], and so on. You can export a child as a shapefile with layer.result[<1,2,3..>].exportArcData(‘filename’)
  • dataOperations (dictionary) – Dictionary which maps a variable to a list of operations to run on it. The dissolved layer will contains in it’s data all the variables specified in this dictionary. Be sure to check the input layer’s fieldNames before use this utility.

The dictionary structure must be as showed bellow.

>>> X = {}
>>> X[variableName1] = [function1, function2,....]
>>> X[variableName2] = [function1, function2,....]

Where functions are strings which represents the name of the functions to be used on the given variableName. Functions could be,’sum’,’mean’,’min’,’max’,’meanDesv’,’stdDesv’,’med’, ‘mode’,’range’,’first’,’last’,’numberOfAreas. By default just ID variable is added to the dissolved map.