Abstract
This dataset presents demographic information of 16,012 students admitted to various regional campuses of the Universidad Nacional de Colombia. The data are compared against departmental population statistics from the Colombian National Administrative Department of Statistics (DANE).
The dataset enables comparative analysis of higher education access gaps by employing Gini index calculations to quantify inequality and examine the impact of students’ school background, age, and socioeconomic status. The objective is to identify disparities in access among citizens aged 20–39 compared to the population admitted to higher education programs.
Keywords: Demographics, Population distribution, Socioeconomic status, Educational background, Geographic origin, Admitted students
Specifications Table
| Subject | Social Sciences |
| Specific subject area | Demographic dataset by department for higher education access at Universidad Nacional de Colombia’s regional campuses |
| Type of data | Tables and figures. |
| Data collection |
Demographic data was taken from the open data of the National Administrative Department of Statistics (DANE) along with data of students admitted to higher education programs from the regional campuses of the Universidad Nacional de Colombia. The data were collected from academic records through the Office of Academic Information, Registration, and Enrollment (Registrar’s Office). https://www.dane.gov.co/index.php/estadisticas-por-tema/demografia-y-poblacion/proyecciones-de-poblacion |
| Data source location |
- Institution: Universidad Nacional de Colombia - City/Town/Region: 1. Amazonia Campus; 2. Caribbean Campus; 3. De La Paz Campus; 4. Manizales Campus; 5. Palmira Campus; 6. Orinoquia Campus; and 7. Tumaco Campus. - Country: Colombia |
| Data accessibility | Repository name: Mendeley Data identification number: https://doi.org/10.17632/jmhghbpvjb.3 Direct URL to data: Villa Garzon, Fernan Alonso; Valencia García, María Carolina; BRANCH, JOHN W. (2025), “Dataset on Higher Education Access and Population Distribution in Seven Campuses in Colombia.”, Mendeley Data, V3, https://doi.org/10.17632/jmhghbpvjb.3. |
| Related research article | None |
1. Value of the Data
-
•
This dataset is useful for studying access to higher education for future research that aims to close educational gaps at the national and international levels, given that the dataset contains demographic information, socioeconomic level, type of high school of origin of the student, age, and also includes the history of admissions by academic semester from 2020 first semester to 2024 second semester, which can be compared with the departmental population.
-
•
The dataset contains the relationship between the population at the departmental level and access to higher education in the regional campuses of Amazonia, Caribe, Manizales, Orinoquia, Palmira, Tumaco and De la Paz, of the Universidad Nacional de Colombia.
-
•
By identifying gaps in access to higher education in various departments of Colombia through the calculation of the access rate, programs can be generated to improve access to higher education.
-
•
The dataset facilitates measuring what percentage of the population is not part of higher education programs, which facilitates the creation of strategies that are adapted to the needs of each region at the national level.
-
•
The dataset allows for the study of admission and retention strategies that can be addressed from underrepresented groups by gender or ethnicity within the admitted population. Allowing for the generation of policies of inclusion and diversity.
-
•
This dataset can be used as a reference in studies on education at the Latin American level.
2. Background
Inequality in higher education is one of the issues that most affects a person's access to choosing a training program after completing their studies. Identifying and creating inclusion programs for education can increase access levels [1,2].
The Universidad Nacional de Colombia, in its Institutional Strategic Plan (PLEI), aims to build an inter-campus Ecosystem milestone of public leadership to address gaps [3].
The variables are analyzed from the admission semester, campus, age, socioeconomic level, type of school of origin, and department of origin of the student to measure if these variables impact access to higher education.
Therefore, understanding the rate of access to higher education, identifying the population with the greatest possibility of entering the university, taking an age range between 20 and 39 years and taking the demographic information from the National Administrative Department of Statistics (DANE) along with the information and characterization of active students for the periods from 2020 first semester to 2024 s semester, taken from the registration office of the Universidad Nacional de Colombia, and by crossing both datasets, the access rate is calculated to determine the gaps by identifying the supply and demand in higher education [4,5].
Recognizing the aspiring population that could enter a higher education program would allow for the generation of new strategies that promote access and improve the offer, including strategies such as identifying the immersion and adaptation of students in their first university semesters, improving academic retention [3].
3. Data Description
The final dataset integrates information from two main sources: National University of Colombia – Office of Registration and Enrollment: admission records for students in higher education programs between 2011 and 2023, and National Administrative Department of Statistics (DANE): population projections by department for the period 2020–2024, disaggregated by age and gender.
The resulting dataset contains 16,012 records of students admitted to higher education programs, linked to population information from Colombia's departments. The structure includes the following main variables: All variables were translated from Spanish to English (Table 1).
Table 1.
Description of admitted dataset variables.
| Variable | Description |
|---|---|
| CAMPUS | Campus of the Universidad Nacional de Colombia. |
| SEMESTER | Academic period of student admission. |
| GENDER | Gender of the student. |
| LEVEL_SOCIOECONOMIC | Classification of people or residential properties according to their economic situation. |
| AGE | Age of each of the active students for the academic period. |
| ID | Anonymized code for each student in the dataset. |
| TYPE OF SCHOOL | Type of school of origin of the student, Public or Private. |
| DEPARTMENT_RESIDENT | Department of Colombia of residence of the student. |
The above data allows for the examination of whether there is any relationship between socioeconomic level, age, gender, and type of school of origin of the student in access to higher education programs, when compared with the population data of the departments of origin of the admitted students.
Additionally, a second dataset taken from the National Administrative Department of Statistics is used, where the departments associated with the regional campuses of the Universidad Nacional de Colombia are identified (Caldas, Cesar, Nariño, Valle del Cauca, Arauca, Archipelago of San Andres, Amazonas), to calculate the rate of access to higher education (Table 2).
Table 2.
Description of population dataset by department.
| Variable | Description |
|---|---|
| DEPARTMENT | Department of Colombia where the campus is located. |
| YEAR | Year in which the population census was conducted |
| MEN BY AGE | Distribution of the male population by age (0 to 100 years). |
| FEMALE BY AGE | Distribution of the female population by age (0 to 100 years). |
| TOTAL BY AGE | Total of men and women by age (0 to 100 years). |
| TOTAL MEN | Sum of the male population. |
| TOTAL FEMALE | Sum of the female population. |
| GRANDTOTAL | Sum of the total population. |
The data was taken from the year 2020, starting from the COVID-19 pandemic where universities presented academic dropouts, to show how in the last four years the entry to higher education has normalized [6].
One relevant aspect is the “No Information” category in the DEPARTMENT_RESIDENT variable. This corresponds to records for which the data was not available in the original institutional database. This category was kept without imputation, as it reflects the limitations of the registration system. Its presence should be considered by researchers wishing to conduct regional analyses.
Therefore, the database of admitted students from the seven campuses of the Universidad Nacional de Colombia was taken, which was processed using analytical tools (Table 3).
Table 3.
Tools and libraries used.
| Tool | Use |
|---|---|
| Python | Main programming language |
| Pandas | Data manipulation and cleaning |
| Matplotlib / Seaborn | Generation of graphs and visualization |
| Scikit-learn | Data preprocessing and transformation |
4. Experimental Design, Materials and Methods
To compile the datasets, administrative records from the office of the Academic and Registration Directorate of the Universidad Nacional de Colombia (DINARA) were consolidated, with detailed information on each of the regional campuses. These records were anonymized using a hash code and organized according to the campus of each student; additionally, the population data was taken from the public information of the National Administrative Department of Statistics (DANE).
The dataset taken from the academic registration office of the Universidad Nacional de Colombia contains information by academic period, campus, gender, age, socioeconomic level, ID, type of school and department of origin, and includes information from 16,012 students, which was cross-referenced with the population information of the National Administrative Department of Statistics (DANE).
The dataset construction process was carried out in three main stages:
4.1. First stage, data collection
Regarding admissions, the records were obtained from the Registrar’s Office of the National University of Colombia and include individual-level information on students admitted to higher education programs between 2011 and 2023. For the population, official projections from DANE (2020–2024) were used, which provide demographic distribution by department, age, and gender. Finally, all admissions records were anonymized to ensure confidentiality and data protection.
4.2. Second stage, cleaning and standardization
The admissions records underwent a process of review and adjustment to ensure consistency in variables such as program type, year of admission, and student demographics. The population data from DANE was reformatted to align with the categories and timeframes used in the university records.
-
•
Standardization of department names, correcting inconsistencies in capitalization, accents, and abbreviations.
-
•
Homogenization of categorical variables, for example: “Male/Female” → “M/F.”
-
•
Conversion of numeric fields (age, year) to integer format.
-
•
Preservation of the “No Information” category for cases where the department of residence was not recorded.
4.3. Third stage, integration of databases
All admissions records were anonymized, removing personal identifiers to guarantee confidentiality and compliance with data protection standards, while preserving the integrity of analytical variables.
-
•
Admission records were grouped by year, based on the SEMESTER variable (example: 2021–1S and 2021–2S → 2021).
-
•
The records were aligned with the DANE database using the department and year as joining keys.
-
•
The reference population (20–39 years old) was calculated for each department and year, which is attached as the variable POP_20_39.
-
•
The published dataset corresponds to the final, completely anonymized version, which can be reused and reproduced using the steps described above.
4.4. Fourth stage, visualization
To synthesize and communicate the main findings, a set of twelve figures was developed. These visualizations constitute the final stage of the analytical process, translating complex datasets into accessible and interpretable formats. Each figure highlights specific patterns in admissions and population dynamics, allowing for comparisons across time, regions, age groups, and gender.
Collectively, the figures serve as an essential tool for identifying trends, supporting evidence-based discussions, and providing a visual summary of the relationships uncovered throughout the study (Fig. 1, Fig. 2, Fig. 3, Fig. 4, Fig. 5, Fig. 6, Fig. 7, Fig. 8, Fig. 9, Fig. 10, Fig. 11).
Fig. 1.
Distribution of admitted population by academic period: Shows the temporal evolution of admissions between 2011 and 2023, highlighting fluctuations across semesters and the effect of external events such as the COVID-19 pandemic.
Fig. 2.
Distribution of students by department: Illustrates the geographic origin of admitted students, identifying departments with the highest and lowest representation.
Fig. 3.
Distribution of students by regional campus: Compares the number of students enrolled across the seven regional campuses, evidencing institutional capacity and regional demand.
Fig. 4.
Population growth by department up to the year 2024: Displays official DANE projections, showing demographic changes across departments and providing context for future admissions capacity.
Fig. 5.
Population by age range and gender in the Department of Caldas: Depicts the demographic structure of Caldas, disaggregated by sex and age groups, useful for analyzing access opportunities.
Fig. 6.
Population by age range and gender in the Department of Cesar: Shows the gender and age distribution of the population in Cesar, enabling comparisons with admissions data.
Fig. 7.
Population by age range and gender in the Department of Nariño: Highlights demographic characteristics in Nariño, with emphasis on differences between male and female cohorts.
Fig. 8.
Population by age range and gender in the Department of Valle del Cauca: Represents population distribution by age and gender, contextualizing educational demand in this key department.
Fig. 9.
Distribution of the population by age group and sex in the Department of Arauca: Shows Arauca’s demographic profile, allowing the identification of potential cohorts eligible for higher education.
Fig. 10.
Distribution of the population by age group and sex in the Archipelago of San Andrés: Visualizes the demographic composition of San Andrés, reflecting the challenges of access in island territories.
Fig. 11.
Population by age group and gender in the Department of Amazonas: Presents demographic data of Amazonas, emphasizing the age groups most relevant for higher education entry.
Limitations
The dataset presents information from the period 2020 first semester, the semester in which the COVID-19 pandemic began [6], data were taken from this period to show the behavior of each academic semester until post-pandemic normalization, including statistical data from the population census of the National Administrative Department of Statistics (DANE).
The dataset presented in this article contains anonymous student information; however, while individual identities are protected, students can still be traced through anonymous IDs.
Ethics Statement
This study does not involve experiments on humans or animals. This work does not entail gathering information from social media platforms. The only data source used in this work is the involved systems’ relational database, which does not include any user-related social media information. The dataset is not linked to any third-party apps or platforms. The dataset contains no sensitive information. Universidad Nacional de Colombia owns this dataset. Data can be used for multiple purposes within the same research domain. Because the data used in this study has already been anonymized, further anonymization before sharing is not required.
Credit Author Statement
Fernan A. Villa-Garzon: Methodology, Data curation. Maria C. Valencia: Writing-original draft, Visualization, Conceptualization, Methodology. John W. Branch: Supervision, Investigation.
Acknowledgements
This research did not receive any specific funding from public, commercial, or non-profit
This research was conducted at the Universidad Nacional de Colombia.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.
Contributor Information
Maria C. Valencia-Garcia, Email: marvalenciaga@unal.edu.co.
Fernan A. Villa-Garzon, Email: favillao@unal.edu.co.
John W. Branch-Bedoya, Email: jwbranch@unal.edu.co.
Data Availability
References
- 1.Quintero R., Pertuz L., Mosalvo J., Amador E., Portnoy I., Acuña-Rodríguez M., Córdova A. Analysis of self-efficacy and attitude-mediated inclusivity in higher education: a case study on the Colombian North Coast. Procedia Comput. Sci. 2024;231:539–544. doi: 10.1016/j.procs.2023.12.247. [DOI] [Google Scholar]
- 2.Fernan A., Villa-Garzon Maria A, Muñoz-Alarcon John W. Branch-Bedoya, ColombiaTuitionSET: labeled dataset for exploring socioeconomic status, career selection, and tuition fees at a Colombian public university. Data Br. 2025;58 doi: 10.1016/j.dib.2024.111242. VolumeISSN 2352-3409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Universidad Nacional de Colombia Plan Estrategico Institucional (PLEI) 2024 https://plei2034.unal.edu.co/fileadmin/Documentos/Rutas/20240126_PLEI2034_Documento_resumen_Lienzo-Rutas.pdf [Google Scholar]
- 4.Khoa Bui Thanh, Huynh Tran Trong. The effectiveness of knowledge management systems in motivation and satisfaction in higher education institutions: data from Vietnam. Data Br. 2023;49 doi: 10.1016/j.dib.2023.109454. VolumeISSN 2352-3409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Acosta-Vargas Patricia, González Mario, Luján-Mora Sergio. Dataset for evaluating the accessibility of the websites of selected Latin American universities. Data Br. 2020;28 doi: 10.1016/j.dib.2019.105013. VolumeISSN 2352-3409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Instituto Nacional de Salud (INS). (2021). COVID-19: progreso de la pandemia y su impacto en las desigualdades en Colombia. https://www.ins.gov.co/Direcciones/ONS/Resumenes%20Ejecutivos/13.%20COVID-19%20progreso%20de%20la%20pandemia%20y%20su%20impacto%20en%20las%20desigualdades%20en%20Colombia.pdf.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.











