Abstract
This data article describes 30 instances of the real-world problem of sequencing steel coils in a continuous galvanizing line. Each instance is represented by a cost matrix that gives information of the cost of sequencing each pair of coils or items together (e.g. a transition). Some transitions are forbidden due to technical limitations of the line and/or because of the properties of the coils, what makes the problem more challenging. These costs were previously obtained by a cost model that estimates the final cost of each transition for a set of coils to be sequenced in the line. Although the instances come from this real context, the problem can be theoretically seen as finding a minimum cost Hamiltonian path (e.g. a minimum cost feasible production sequence with all the coils appearing just once). It is a well-known NP-Hard combinatorial optimization problem. Since these instances represent real challenges found in the industry, they can be very useful for algorithm development and testing. Due to the cost distributions obtained for the given coils, just finding a feasible sequence can be a challenging task, especially for some types of approximate algorithms (Alvarez-Gil et al., 2022).
Keywords: Steel industry, Combinatorial optimization, Manufacturing scheduling, Metaheuristics
Specifications Table
| Subject | Business, Management and decision sciences (Management Science and Operations Research) |
| Specific subject area | Operations Research applications (planning & scheduling, logistics, etc.) and Combinatorial Optimization (algorithm development and benchmarking) |
| Type of data |
|
| How the data were acquired | The data were acquired using a customized software that helps the scheduling crew with the sequencing tasks. The software receives the information of the coils to be sequenced. The software has an embedded cost model that calculates, for each pair of coils, the cost of sequencing them together based on its properties, generating a cost matrix for the set of coils. Then the software allows to export the cost matrices in the provided format (.txt). |
| Data format | Raw |
| Description of data collection | The data were acquired ensuring that:
|
| Data source location | Spanish steel factory |
| Data accessibility | Repository name: Problem instances of the sequencing problem of a steel continuous galvanizing line (Mendeley Data) Data identification number: DOI:10.17632/v357z2ncbh.2 Direct URL to data: https://data.mendeley.com/datasets/v357z2ncbh/2 |
| Related research article | N. Alvarez-Gil, S. Alvarez, R. Rosillo, D. de la Fuente, Sequencing Jobs with Asymmetric Costs and Transition Constraints in a Finishing Line: A real case study, Computers & Industrial Engineering 165 (2022). 10.1016/j.cie.2021.107908 |
Value of the Data
-
•
The dataset provides 30 different instances of a real-world sequencing problem with transition constraints and asymmetric costs that can be used to evaluate and compare algorithms performance. The reference results presented in [1] can be used as benchmark solutions.
-
•
The instances can help in the development, design and testing of combinatorial optimization algorithms. Since the instances are highly constrained, it can help in the development of new approaches able to efficiently explore the solution space and in the design of constraint-handling strategies. Due to the cost distributions obtained from the real sets of items to be sequenced in the line, some of the instances are very complex and can be very challenging for the approximate algorithms, especially for those algorithms that use the cost information to guide the exploration, and this focus on the costs can misguide the feasibility search (see [1]). The use of the presented dataset can be very useful to develop robust solutions able to handle constraints while minimizing the cost.
-
•
Problem instances are directly provided as a cost matrix what makes them easy and ready to use. They represent real challenges of a scheduling problem found in the industry, but they can also be abstracted to the more theoretical problem of finding a minimum-cost Hamiltonian path [2]. Hence, they can be used both to gain insights for the different combinatorial optimization applications (scheduling, routing, etc.) and for theoretical purposes (graph analysis, constraint-handling strategies, approximate algorithms, etc.).
1. Data Description
The dataset consists in 30 different problem instances with sizes ranging from 17 to 114 items (e.g. coils, nodes, etc.). Each instance is named as “cgl_X.txt”, being X the size of the instance. Table 1 shows the name and the size of the problem instances provided in the dataset.
Table 1.
List of problem instances names, sizes and reference solutions.
| Instance name | Size | Instance name | Size |
|---|---|---|---|
| cgl_17.txt | 17 items | cgl_51b.txt | 51 items |
| cgl_26.txt | 26 items | cgl_57.txt | 57 items |
| cgl_28.txt | 28 items | cgl_58.txt | 58 items |
| cgl_32.txt | 32 items | cgl_60.txt | 60 items |
| cgl_33.txt | 33 items | cgl_66.txt | 66 items |
| cgl_37.txt | 37 items | cgl_70.txt | 70 items |
| cgl_38.txt | 38 items | cgl_70b.txt | 70 items |
| cgl_43.txt | 43 items | cgl_72.txt | 72 items |
| cgl_44.txt | 44 items | cgl_73.txt | 73 items |
| cgl_45.txt | 45 items | cgl_76.txt | 76 items |
| cgl_47.txt | 47 items | cgl_78.txt | 78 items |
| cgl_48.txt | 48 items | cgl_81.txt | 81 items |
| cgl_48b.txt | 48 items | cgl_88.txt | 88 items |
| cgl_50.txt | 50 items | cgl_107.txt | 107 items |
| cgl_51.txt | 51 items | cgl_114.txt | 114 items |
Each instance is represented by a n x n square matrix, being n the size of the instance (e.g. number of items to be sequenced). The cost matrix provides information about the cost of scheduling two pair of coils together in the sequence. The element [i,j] of the matrix represents the cost of processing item i right before item j (Cij). An element Cij can take two possible values:
-
•
Cij > 0, Cij ∈ ℝ: This means that the transition between item i and j is allowed (e.g. item i can be produced immediately before item j) with a cost Cij.
-
•
Cij = −1: This means that the transition between item i and j is not allowed (e.g. item i cannot be produced immediately before item j) and it represents a hard constraint.
The problem consists in finding a minimum-cost feasible sequence containing all the nodes just once, what can be seen as finding a minimum-cost Hamiltonian path [2]. We refer to a feasible sequence as a sequence for which all the transitions are allowed: a sequence without constraints violations. Just finding a feasible sequence is a challenge itself, but the cost should also be minimized. The lower number of constraints the better. For a feasible sequence, the lower the cost the better. The initial and final items of the sequence are not fixed and thus that decisions are part of the optimization problem, since they can directly impact in the final cost of the sequence.
The problem shares some similarities with other well-known combinatorial optimization problems such as the Asymmetric Traveling Salesman Problem [3,4], the Sequencing Ordering Problem [5,6] and the Hamiltonian Cycle Problem [7,8]. More information about the similarities and differences can be found in [1].
All the cost matrices are provided in the same format, shown in Fig. 1.
Fig. 1.
Content of file cgl_17.txt.
Each text file (.txt) has n rows, and each row has n columns delimited by the character “;”. Taking instance cgl_17.txt for illustration (Fig. 1), the cost matrix should be interpreted in the following way:
From row 1:
-
•
Matrix element [0, 0] (C00= −1): not possible, each item must appear just once in the sequence.
-
•
Matrix element [0, 1] (C01= 300): cost of producing item 0 right before item 1.
-
•
Matrix element [0, 2] (C02= −1): not possible, production constraint.
-
•
Matrix element [0, 3] (C03= 654): cost of producing item 0 right before item 3.
-
•
Matrix element [0, 4] (C04= 926): cost of producing item 0 right before item 4.
-
•
(...)
From row 2:
-
•
Matrix element [1, 0] (C10= 305): cost of producing item 1 right before item 0.
-
•
Matrix element [1, 1] (C11= −1): not possible, each item must appear just once in the sequence.
-
•
Matrix element [1, 2] (C12= 963): cost of producing item 1 right before item 2.
-
•
Matrix element [1, 3] (C13= 117): cost of producing item 1 right before item 3.
-
•
Matrix element [1, 4] (C14= 237): cost of producing item 1 right before item 4.
-
•
(...)
A possible solution for this instance is S = [2, 0, 1, 3, 4, 5, 6, 7, 16, 11, 15, 10, 8, 12, 14, 13, 9]. It is a valid sequence since all the items appears just once, and the length of the sequence is n. In order to get the final cost of the sequence, all the transition costs should be computed, and the final cost is the sum of them. For a sequence of length n, the number of transitions is n −1. For sequence S, the cost CS will be:
It is worth mentioning that, for all the instances of the dataset, the sequence [0, 1, 2, 3, …, n-1] is a feasible sequence and can be used as a result reference. The only exception is instance cgl_38.txt, for which that solution contains one constraint violation.
2. Experimental Design, Materials and Methods
The instances come from the real-world problem of sequencing steel coils (e.g. final format of flat steel products) in a continuous galvanizing line. The galvanization is a finishing process that provides the coils with a zinc layer to protect them against air and moisture [9]. The line operators use a stand-alone scheduling application installed in their computer that receives, for each day, the set of coils that should be sequenced and their main properties (width, thickness, steel grades, zinc coating, etc.). This information is obtained connecting directly to the plant Manufacturing Execution System (MES). Then, the software calculates the cost of processing each pair of coils and creates the cost matrix, which is the only input required for the sequencing problem.
The galvanizing process is continuous, being all the coils welded to the others in order to create a continuous strip. The cost of sequencing two coils together depends on their properties, seeking to avoid sudden changes in the main properties (width, thickness, zinc coating, etc.) when processing a new coil. For example, if two consecutive coils have different widths, some meters of strip near the welding zone may not meet the quality requirements and may be sold as scrap at a much lower price. Additionally, if the change in some properties is very sharp, there exists risk of strip breakage and that transition is penalized with a high cost. The scheduling software also calculates the constraints. Some coils cannot be sequenced together because their steel grades are not weldable or because the difference in some of their properties exceeds a threshold defined by the expert schedulers.
In order to obtain the problem instances of the dataset, we looked for daily schedules processed in the line in the past, so we were sure that the set of is sequenceable. Then, we used the scheduling software to load that schedules, generate the cost matrices and exported them in the provided format.
Ethics Statement
N/A.
CRediT authorship contribution statement
Nicolás Álvarez-Gil: Conceptualization, Methodology, Software, Formal analysis, Investigation, Resources, Writing – original draft, Writing – review & editing, Data curation. Segundo Álvarez García: Conceptualization, Methodology, Software, Formal analysis, Investigation, Resources, Writing – review & editing, Data curation. Rafael Rosillo: Conceptualization, Methodology, Visualization, Investigation, Supervision, Writing – review & editing. David de la Fuente: Conceptualization, Investigation, Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
N/A.
References
- 1.Alvarez-Gil N., Alvarez S., Rosillo R., de la Fuente D. Sequencing jobs with asymmetric costs and transition constraints in a finishing line: a real case study. Comput. Ind. Eng. 2022;165 doi: 10.1016/j.cie.2021.107908. [DOI] [Google Scholar]
- 2.Sohel Rahman M., Kaykobad M. On Hamiltonian cycles and Hamiltonian paths. Inf. Process. Lett. 2005;94:37–41. doi: 10.1016/j.ipl.2004.12.002. [DOI] [Google Scholar]
- 3.Applegate D.L., Bixby R.E., Chvátal V., Cook W.J. Princeton University Press; 2007. The Traveling Salesman Problem: A Computational Study. [DOI] [Google Scholar]
- 4.Helsgaun K. An effective implementation of the Lin–Kernighan traveling salesman heuristic. Eur. J. Oper. Res. 2000;126:106–130. doi: 10.1016/S0377-2217(99)00284-2. [DOI] [Google Scholar]
- 5.Escudero L.F. An inexact algorithm for the sequential ordering problem. Eur. J. Oper. Res. 1988;37:236–249. doi: 10.1016/0377-2217(88)90333-5. [DOI] [Google Scholar]
- 6.Ascheuer N., Jünger M., Reinelt G. A branch & cut algorithm for the asymmetric traveling salesman problem with precedence constraints. Comput. Optim. Appl. 2000;17(200):61–84. doi: 10.1023/A:1008779125567. [DOI] [Google Scholar]
- 7.Garey M.R., Johnson D.S., Tarjan R.E. The planar Hamiltonian circuit problems is NP-complete. SIAM J. Comput. 1976;5:704–714. doi: 10.1137/0205049. [DOI] [Google Scholar]
- 8.Garey M.R., Jhonson D.S. W.H. Freeman and Company; 1979. Computers and Intractability A Guide to the Theory of NP-Completeness. [Google Scholar]
- 9.Fernández S., Álvarez S., Díaz D., Iglesias M., Ena B. Vol. 8667. 2014. Scheduling a galvanizing line by ant colony optimization; pp. 146–157. (ANTS 2014: Lecture Notes in Computer Science). [DOI] [Google Scholar]

