Skip to main content
MethodsX logoLink to MethodsX
. 2025 Feb 7;14:103211. doi: 10.1016/j.mex.2025.103211

PyPortOptimization: A portfolio optimization pipeline leveraging multiple expected return methods, risk models, and post-optimization allocation techniques

Rushikesh Nakhate a, Harikrishnan Ramachandran a,, Amay Mahajan b
PMCID: PMC12370148  PMID: 40852566

Abstract

This paper presents PyPortOptimization, an automated portfolio optimization library that incorporates multiple methods for expected returns, risk return modeling, and portfolio optimization. The library offers a flexible and scalable solution for constructing optimized portfolios by supporting various risk-return matrices, covariance and correlation matrices, and optimization methods. Users can customize the pipeline at every step, from data acquisition to post-processing of portfolio weights, using their own methods or selecting from predefined options. Built-in Monte Carlo simulations help assess portfolio robustness, while performance metrics such as return, risk, and Sharpe ratio are calculated to evaluate optimization results.

  • The study compares various configured methods for each step of the portfolio optimization pipeline, including expected returns, risk-modeling and optimization techniques.

  • Custom Designed Allocator outperformed. For example, the Proportional Allocator's sharpe ratio of out-performed the expected average.

  • A caching system was implemented to optimize execution time.

Keywords: Portfolio optimization, Risfolio-lib, PyPortfolioOpt, Monte carlo simulation

Method name: Run optimization pipeline

Graphical abstract

Image, graphical abstract


Specifications table

Subject area: Economics and Finance
More specific subject area: Portfolio Optimization
Name of your method: run optimization pipeline
Name and reference of original method: Our approach uses below two methods for references.
  • PyPortfolioOpt is a library that implements portfolio optimization methods, including classical efficient frontier techniques and BlackLitterman allocation.

  • Riskfolio-Lib is a library for making quantitative strategic asset allocation or portfolio optimization in Python

Resource availability: https://github.com/rushikeshnakhate/PyPortOptimizationPipeline.git

Background

Portfolio optimization [1] is a cornerstone of financial portfolio management, balancing the trade-off between risk and return to maximize investor utility. Despite advancements in the field, implementing a comprehensive and customizable portfolio optimization solution often requires extensive technical expertise and resources. Existing libraries such as PyPortfolioOpt [2] and Riskfolio-Lib [3] have laid the groundwork for accessible portfolio optimization.

PyPortOptimization addresses these challenges by offering a robust, modular, and extensible pipeline for automating the portfolio optimization process.

Simplifying portfolio optimization

Traditional approaches to portfolio optimization are often fragmented, requiring users to integrate multiple tools and methodologies. This can be a daunting task, especially for those new to the field. PyPortOptimization consolidates the entire process from data acquisition to performance evaluation into a single, cohesive library. This framework provides user flexibility of plugging in their preferred methods at various stages.

Customizability and flexibility

PyPortOptimization provides flexibility that allows users to customize each step of the pipeline such as expected returns estimation [4], risk-return modeling [5], and post-optimization weight adjustments [6].

Time-Efficient execution

To address data analysis of large dataset and to do mathematical calculation PyPortOptimization includes an ExecutionTimeRecorder to log function execution times [7], enabling users to identify and optimize bottlenecks. The library also implements efficient algorithms and data structures to minimize processing time with accuracy.

Practical usability

This library supports for multiple date frequencies (yearly, monthly, multiyear) and post-optimization weight adjustments [6] like diversity enforcement and transaction cost minimization PyPortOptimization is designed for professional portfolio management.

Use cases

PyPortOptimization reduces the gap between theoretical portfolio optimization frameworks and practical, user-friendly implementations. Its modular architecture, focus on customization, and integration of advanced techniques empower users to make data-driven investment decisions, ensuring adaptability to ever-evolving financial markets. Whether you are an institutional investor, academic researcher, or personal finance enthusiast, PyPortOptimization is your gateway to effective and innovative portfolio management.

Method details

PyPortOptimization is a library designed to automate and simplify the portfolio optimization process. It provides a customizable and modular pipeline to calculate expected returns [4], create risk-return matrices [5], run portfolio optimization, conduct Monte Carlo simulations [8], post-process weights, and evaluate portfolio performance [9]. This method ensures reproducibility, scalability, and flexibility, enabling researchers to replicate and extend the process for various use cases.

  • Below is the simple usage of PyPortOptimization.

run_optimization_pipeline (years= [2020],

tickers= [“HDFCBANK.NS”,“RELIANCE.NS”, “CIPLA.NS”])

  • Usage of PyPortOptimization with all argument details as below.

run_optimization_pipeline (years, # Mandatory

months=None, # Optional

frequency=“yearly”, # Optional

tickers=None, # Optional, list of tickers

data_directory=“project_directory” # default

expected_return_methods=None, # Optional, used for calculating expected return

risk_return_methods=None, # Optional, Used for risk-return calculations

optimization_methods=None, # Optional, Used for optimization

post_processing_methods=None #Optional, Used for post-processing

)

Arguments detailed information:

years (Mandatory):

  • Specifies the years for which the analysis and portfolio generation will be conducted.

  • Accepts a list of years (e.g., years= [2023, 2024]).

  • The Date Range Generator Library will use this input to determine the overall start and end dates.

months (Optional):

  • Specifies the months within the provided years to focus on.

  • Accepts a list of integers (1 for January, 2 for February, etc.).

  • If not provided, the function will consider all months in the given years for the analysis.

frequency (Optional): Date Range Generator Library determines how date ranges are structured as below.

  • monthly: Generates portfolios for each month within the specified years and months. If no specific months are provided, it generates ranges for all 12 months of each year. This is default.

  • yearly: Generates portfolios for each full year in the specified years. The start date is January 1, and the end date is December 31 for each year.

  • multiyear: Consolidates all specified years into a single range. The start date is January 1 of the first year, and the end date is December 31 of the last year.

tickers (Optional):

  • Accepts a list of tickers

  • When tickers are not provided, the function loads the configuration to determine which sources and tickers to fetch.

expected return methods (Optional): A PyExpectedReturns is a Python library designed for calculating expected returns [4] using various statistical and financial models [10] to forecast asset performance based on historical date. It accepts a list of expected return methods from below supported methods.

  • ARIMA

  • ArithmeticMeanHistorical

  • BlackLitterman

  • CAPM

  • CAGRMeanHistorical

  • EMAHistorical

  • FamaFrench

  • GordonGrowth

  • HoltWinters

  • LinearRegression

  • RiskParity

  • TWRR

risk return methods (Optional): A PyRisksCalculator is a Python library designed orchestrates the calculation of covariance matrices for multiple risk models, loading them from pickle files if they exist, or calculating and saving them if they do not. This function is part of the risk management pipeline and ensures that covariance matrices are available for further use in portfolio optimization and risk management.

The pipeline accepts a list of risk return methods from below supported methods.

  • AutoencoderRiskModel

  • CopulaRiskModel

  • ExponentialCovariance

  • GaussianProcessRiskModel

  • GraphicalLasso

  • KMeansClustering

  • LedoitWolfConstantCorrelation

  • LedoitWolfConstantVariance

  • LedoitWolfShrinkage

  • LedoitWolfSingleFactor

  • OracleApproximatingShrinkage

  • RandomForestVolatility

  • RegimeSwitchingRiskModel

  • SVMVolatility

  • SampleCovariance

  • SemiCovariance

optimization methods (Optional): A PyOptimizers is a Python library designed to simplify portfolio optimization using various risk and return models. It enables users to calculate efficient frontiers and obtain optimized portfolio weights with multiple optimization method.

Below mentioned are the supported methods of optimization methods

  • PyPortfolioOptFrontier

  • PyPortfolio OptFrontierWithShortPosition

  • MVRiskFolioOptimizer

  • MADRiskFolioOptimizer

  • MSVRiskFolioOptimizer

  • FLPMRiskFolioOptimizer

  • SLPMRiskFolioOptimizer

  • CVaRRiskFolioOptimizer

  • EVaRRiskFolioOptimizer

  • WRRiskFolioOptimizer

  • MDDRiskFolioOptimizer

  • ADDRiskFolioOptimizer

  • CDaRRiskFolioOptimizer

  • UCIRiskFolioOptimizer

  • EDaRRiskFolioOptimizer

post processing methods (Optional): The Portfolio Allocation Processor (PyPAP) is a Python library designed to allocate portfolio weights and process based on different allocation methods.

Below is the list of supported methods.

  • CustomClusteredAllocator

  • CustomDiversityAllocator

  • CustomProportiona lRoundingAllocator

  • CustomTransactionCostAllocator

  • CustomWeightedFloorAllocator

  • GreedyPortfolio

  • LpPortfolio

Overview of Pipeline: The key steps of Optimization pipeline are shown in the below Fig. 1:

Fig. 1.

Fig 1:

Portfolio Optimization Workflow.

Date Range Generator Library:

  • The Date Range Generator Library is a tool designed to generate date ranges based on specified years, months, and frequencies. This library is useful for applications that require generation of time intervals, such as time-series analysis [11], reporting, or scheduling tasks.

  • The library also implements unit tests using Python's unit test framework [12] and can be extended based on new requirements.

import datetime from src.dataDownloader.main import generate_month_date_ranges

# Specify the year

year = 2024

# Generate ranges for January, February, and March

months =[1, 2, 3]

# Generate the date ranges for the specified months

month_ranges = generate_month_date_ranges (year, months)

# output [202,401, 202,402, 202,403]

Data Downloader:

  • It's a tool for downloading financial data from multiple sources, including Yahoo Finance [13], Alpha Vantage [14], and Custom CSV files.

  • It provides framework for fetching data for different asset types, such as stocks and bonds, and allows you to retrieve data for specific tickers or from a predefined configuration.

import os import pandas as pd from src.dataDownloader.main import get_data

# Specify parameters

current_dir = os.getcwd() # Current working directory

start_date = "2024–01–01″ # Start date for data fetch

end_date = "2024–01–31″ # End date for data fetch

tickers = [“AAPL”, “GOOGL”] # List of tickers

# Fetch data for the specified tickers

data = get_data (current_dir=current_dir,

start_date=start_date,

end_date=end_date,

tickers=tickers,

asset_type=“stocks” # Type of asset (e.g., stocks)

)

# Display the first few rows of the fetched data print(data.head())

Date Symbol Close
0 2024–01–02 AAPL 174.50
1 2024–01–03 AAPL 175.00
2 2024–01–04 AAPL 173.75
3 2024–01–31 GOOGL 148.00

PyExpectedReturns Library for Expected Return Calculation:

  • PyExpectedReturns is a Python library for calculating expected returns [4] using various statistical and financial models [10]. The library provides a flexible and configuration-driven approach to calculate and forecast potential asset performance based on historical data.

  • The main advantage of this approach is that once the expected returns [4] for a specific date and method are calculated, they are cached and stored in a pickle file. Subsequent requests for the same data will fetch the results directly from the cache, improving performance and reducing computation time.

  • Addition of new expected return calculation method can be done in few steps. First user needs to define a New Class, A new class is created to implement the expected return calculation logic for the method. Each class includes a calculate expected return () method that takes the necessary financial data and returns the calculated expected returns [4].

  • Optional Configuration: For improved flexibility, users can enable or disable specific methods via a configuration file. This ensures that only relevant methods are used in the analysis, based on the research context.

from src.expected_return_calculator import calculate_all_returns

import pandas as pd

# data: Sample data dataftame or please use the data generated by the data downloader or download the closing prices.

data = pd.DataFrame({

’asset_1′: [0.1, 0.2, 0.15, -0.1, 0.05],

’asset_2′: [0.05, 0.15, 0.2, -0.05, 0.1],

’asset_3′: [0.2, 0.3, 0.25, -0.1, 0.15]

})

# List of enabled methods for calculating returns or leave as None to use default from config

enabled_methods = [’ARIMA’, ’ArithmeticMeanHistorical’]

# Calculate expected returns , return dataframe

df_returns = calculate_all_returns (data, output_dir, enabled_methods)

PyRisksCalculator Library for Risk-Return Calculation:

  • PyRisksCalculator is a Python library designed to calculate of covariance matrices across multiple risk models. The library ensures that covariance matrices are readily available for further use in portfolio optimization and risk management processes by either loading existing matrices from pickle files or calculating and saving them if they don't exist.

  • The core function in the PyRisksCalculator library is calculate all risk matrix, which computes covariance matrices for multiple risk models, either by calculating them or by loading them from existing pickle files.

  • To add a new risk model in the PyRisksCalculator library, a new class that implements the calculate covariance matrix () method to compute the covariance matrix for the given financial data. Once the class is defined, it can be added to the configuration file to enable or disable.

import pandas as pd

# data: Sample data from previous example

# Call the function to calculate and return the covariance matrices for all enabled risk models

covariance_dict = calculate_all_risk_matrix(data)

# Example of accessing the covariance matrix for a specific model

cov_matrix_sample_covariance = covariance_dict.get(’SampleCovariance’)

PyOptimizers Library for Optimization:

  • The PyOptimizers library simplifies portfolio optimization by providing multiple risk and return models for calculating efficient frontiers and obtaining optimized portfolio weights.

  • It supports range of optimization methods, enabling users to select the most appropriate technique for their portfolio management strategy.

  • To add a new optimization method into the PyOptimizers library define a new optimizer class that implements the required logic for portfolio optimization, including the calculation of weights, returns, volatility, and other performance metrics.

  • Once the class is created, it can be added to the configuration file, enabling or disabling it for the optimization process.

from pathlib import Path import pandas as pd from pyoptimizers import calculate_optimizations

# Sample inputs data = pd.DataFrame()

# Your stock data

expected_return_df = pd.DataFrame() # Expected returns

risk_return_dict = {} # Risk-return covariance matrices

# Call the optimization function

optimized_data = calculate_optimizations(data, expected_return_df, risk_return_dict)

Monte Carlo Simulation: [8]

  • The Monte Carlo Simulation [8] Library is designed to help portfolio optimization by generating random portfolios and evaluating their risk-return [15] profiles.

  • It allows for a range of customizations and provides tools to identify the most efficient portfolio for various criteria, like maximum Sharpe ratio or minimum volatility.

  • In the Monte Carlo Simulation Library has ability to control the number of experiments or iterations through the parameters.

  • Users can specify number of simulations to run, allowing for flexibility in adjusting the granularity and depth of the analysis.

  • This control allows users to balance between computation and the accuracy of the simulation, ensuring that they can change the number of iterations to fit their research or portfolio optimization needs. By adjusting the parameter for the number of experiments, users can refine the simulation to provide more detailed into portfolio performance [9].

  • The data generated from each Monte Carlo simulation [8] is appended to the existing historical data.

  • This combined dataset, merges historical data with simulated data, enables a more comprehensive analysis of portfolio performance [9].

  • By appending simulated scenarios into the historical data, researchers can perform a more robust analysis, capturing a range of potential outcomes and better understanding the risk-return trade-offs [15].

  • This combined approach enhances the accuracy of portfolio optimization and provides valuable details for research.

from monte_carlo_simulation import MonteCarloSimulation

import pandas as pd

# Create a DataFrame with historical data

data = pd.DataFrame({

“StockA”:[0.01, 0.02, -0.01, 0.03],

“StockB”:[-0.02, 0.01, 0.04, 0.01],

“StockC”:[0.03, -0.01, 0.02, 0.05],

})

output_dir = “output_directory_path”

monte_carlo_simulation = MonteCarloSimulation (data, output_dir)

max_sharpe_ratio, min_volatility = monte_carlo_simulation.run_monte_carlo_simulation ()

PyPAP Library for Post-Processing:

  • The Portfolio Allocation Processor (PyPAP) is a Python library designed for portfolio allocation using various allocation methods.

  • The library includes a range of custom algorithms for post-processing portfolio weights, improving both risk diversification and returns optimization. We have introduced new custom algorithms tailored for specific allocation strategies.

  • CustomClusteredAllocator: Implements clustering techniques [16] to diversify risk across different assets in the portfolio. The allocator clusters assets based on their weights and prices to ensure optimal diversification.

  • This method ensures that assets within the same cluster have similar characteristics, which can result in a more balanced risk-return profile.

  • CustomDiversityAllocator: Ensures optimal diversification by balancing asset allocations [17] according to predefined diversity thresholds. It adjusts the portfolio's weight distribution to reduce concentration in any single asset. By enforcing diversity rules, this allocator prevents excessive exposure to individual assets, ensuring better risk management.

  • CustomGreedyAllocation: A simple but effective approach that follows a greedy algorithm [18] to maximize returns by allocating the budget to assets with the highest weight first. This method is designed to optimize portfolio performance by focusing on the most promising assets first while respecting the budget constraint.

  • CustomProportionalRoundingAllocator: This allocator aims to distribute portfolio value among assets based on their weight (proportional to their relative value in the portfolio). It rounds each asset's allocation to the nearest whole number (using np.round). It handles rounding errors by distributing the remaining budget (after rounding) proportionally to the assets in order of their original weight, prioritizing those with the highest weight.

  • CustomTransactionCostAllocator: This allocator [17] adjusts the asset weights to account for transaction costs before determining the number of shares or units to buy. The weights are reduced by a transaction cost rate (default is Allocations are then rounded to the nearest whole number of shares, and the remaining budget is returned as an unallocated amount

  • CustomWeightedFloorAllocator: This allocator [19] computes the allocation by rounding down (using np.floor) each asset's weight as a number of shares based on its price and weight in the portfolio. After determining the floored allocation, it calculates the total value of the portfolio based on the floored shares and calculates any remaining unallocated budget. If there's remaining budget, it redistributes it among higher-weighted assets, aiming to buy as many shares as possible with the remaining funds.

  • This library is designed with caching mechanisms [20] to enhance performance by storing intermediate results and reducing redundant calculations. These allocators are built with flexibility in mind, making them easily extendable for future requirements or modifications. This modular design allows for straightforward integration of additional allocation strategies or custom logic without disrupting the overall system, enabling continuous improvement and adaptability to evolving portfolio management needs.

import pandas as pd

from pathlib import Path

from src.processing_weight import PortfolioAllocationProcessor as PAP

# Sample data: Assume that `results_df` and `data` are already loaded or generated

results_df = pd.DataFrame({

'weights_column': [{'stock_a': 0.5, 'stock_b': 0.3, 'stock_c': 0.2}],

'other_column': ['data']

})

data = pd.DataFrame({

'stock_a': [100, 105, 110],

'stock_b': [200, 210, 215],

'stock_c': [150, 145, 140]

})

# Path to the current month's directory where the results will be saved

current_month_dir = Path(“/path/to/current_month_dir”)

# If you want to use specific enabled methods:

enabled_methods = ['GreedyPortfolio', 'LpPortfolio']

# Or, if you want to let the function load enabled methods from the config file:

enabled_methods = None # The function will load enabled methods from config

# Run the function to process the portfolio data and allocation

final_results_df = PAP.run_all_post_processing_weight(

results_df=results_df,

data=data,

current_month_dir=current_month_dir,

enabled_methods=enabled_methods,

budget=1,000,000

)

Portfolio Performance Calculation Library for Back Testing:

  • This tool provides the functionality to calculate key performance metrics (such as Volatility, Return, and Sharpe Ratio) for portfolios based on allocation columns.

  • It processes a post processing DataFrame containing portfolio allocations, evaluates the performance using various metrics, and appends the results to the original DataFrame.

  • This process involves back testing simulating portfolio performance over a historical period to assess how different strategies would have performed in the past.

import pandas as pd

from pathlib import Path

from mymodule.performance import calculate_performance

# Example post-processing DataFrame with allocation columns

post_processing_df = pd.DataFrame({

'Allocation_Greedy': [“{'asset_1′: 0.4, 'asset_2′: 0.6}”, “{'asset_1′: 0.5, 'asset_2′: 0.5}”],

'Allocation_LP': [“{'asset_1′: 0.3, 'asset_2′: 0.7}”, “{'asset_1′: 0.6, 'asset_2′: 0.4}”],

'Greedy_remaining_amount': [1000, 1500],

'LP_remaining_amount': [2000, 2500]

})

# Sample data (this could be price data or any asset data)

data = {

'asset_1′: [100, 105, 110, 108, 107],

'asset_2′: [200, 195, 193, 191, 190]

}

# Define start and end dates

start_date = "2024–01–01″

end_date = "2024–01–31″

# Define the directory for the current month

current_month_dir = Path("/path/to/data/2024–01″)

# Call the function to calculate portfolio performance

updated_df = calculate_performance (post_processing_df, data, start_date, end_date, current_month_dir)

Execution Time Tracking Library:

  • The ExecutionTimeRecorder class acts as a decorator to records the execution time [7] of functions. It measures the start and end time of a function call, calculates the duration, and logs it for performance monitoring.

  • The results include unction names, module names, and execution times [7], are stored in a global DataFrame (results df). This data can be printed in a readable format using print results () or accessed via function get_performancedataframe().

Caching and Logging:

  • The execution times of functions are cached in memory within each Dataframe. This avoids redundant logging and provides quick access to previously recorded execution times [7].

  • Additionally, the logging mechanism ensures that each function's execution time is logged to a file or console, depending on the logging configuration, providing an audit trail.

Method validation

Execute optimization pipeline portfolios as below.

tickers = [“HDFCBANK.NS”,“RELIANCE.NS”, “CIPLA.NS”,“DIVISLAB.NS”,“HDFCLIFE.NS”,

“BHARTIARTL.NS”,“ASIANPAINT.NS”,“INFY.NS”]

run_optimization_pipeline (years= [2022, 2023, 2024], tickers=tickers)

The run optimization pipeline function was executed on datasets for individual years (2022, 2023, and 2024) for stocks as mentioned in above code. When applied to the specified input parameters, the function generated over 2000 portfolios, each configured with varying risk levels, expected returns, optimization techniques, and post-processing weights. From these generated portfolios, the top-performing ones were shortlisted based on validation metrics, including Expected Annual Return, Annual Volatility, Sharpe Ratio [21], and allocation patterns across tickers. The method's efficiency was assessed by analyzing these metrics across the years and identifying trends in allocation strategies and risk-return profiles [15].

The output includes various columns focusing on risk type, expected return, optimization method, and related performance metrics such as volatility, return, and Sharpe ratio for simplicity only performance parameters are shown in below tables.

Referring to data from Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8 , the Proportional Allocator [22] achieved the highest Sharpe ratio of 3.4507 in 2024, demonstrating its superior risk adjusted returns. The Diversity Allocator showed consistent performance, with a Sharpe ratio [21] of 3.1169 in 2024, indicating its robustness across different market conditions. The Weighted Allocator [23] and Lp Portfolio provided balanced and stable results, making them ideal for conservative portfolio strategies.

Table 1.

Performance Metrics for Clustered Allocator.

Year Volatility Return Sharpe Ratio
2022 0.1861 128,013.2 578,558.8
2023 0.1394 −19,653.6 −127,638.8
2024 0.1962 −66,315.5 −305,892.4

Table 2.

Performance Metrics for Diversity Allocator.

Year Volatility Return Sharpe Ratio
2022 0.2027 0.6013 1.6618
2023 0.1438 0.2233 1.4678
2024 0.2041 0.7110 3.1169

Table 3.

Performance Metrics for Proportional Allocator.

Year Volatility Return Sharpe Ratio
2022 0.1879 0.1183 0.6285
2023 0.1313 0.1966 1.4126
2024 0.1777 0.5725 3.4507

Table 4.

Performance Metrics for Transaction Allocator.

Year Volatility Return Sharpe Ratio
2022 0.1842 0.4000 1.5263
2023 0.1392 0.2166 1.4554
2024 0.1942 0.3575 2.7290

Table 5.

Performance Metrics for Weighted Allocator.

Year Volatility Return Sharpe Ratio
2022 0.1842 0.3315 1.2656
2023 0.1393 0.2186 1.4700
2024 0.1932 0.2350 2.4615

Table 6.

Performance Metrics for Greedy Portfolio.

Year Volatility Return Sharpe Ratio
2022 0.2142 0.1182 0.6274
2023 0.1419 0.2161 1.4572
2024 0.2273 0.4586 2.9127

Table 7.

Performance Metrics for Lp Portfolio.

Year Volatility Return Sharpe Ratio
2022 0.2142 0.1348 0.6708
2023 0.1419 0.2162 1.4574
2024 0.2275 0.4579 2.9117

Table 8.

Performance Metrics for Expected Values.

Year Volatility Return Sharpe Ratio
2022 0.1543 0.3214 1.2578
2023 0.1245 0.2987 1.1345
2024 0.1678 0.3467 1.3890

Expected vs Actual Portfolio Performance Metrics:

  • The expected values served as a benchmark, representing baseline assumptions for annual returns, volatility, and Sharpe ratio [21].

  • Actual allocator performance consistently exceeded these expected values. For instance, the Proportional Allocator's [22] Sharpe ratio of 3.4507 significantly outperformed the expected average.

Insights and Observations:

  • The caching mechanism ensures efficient execution for iterative analysis, saving time and computational resources.

  • Allocators such as the Proportional Allocator [22] and Diversity Allocator excel in high-risk, high-return scenarios.

  • Balanced methods like the Weighted Allocator [23] and Lp Portfolio are suitable for steady, low-risk strategies.

  • The reproducibility of results highlights the robustness and reliability of the system for both research and practical applications.

First Execution: During the initial execution Logs captured detailed information as below. The process ensured accurate computations and thorough validation of all steps.

  • Dates of processing.

  • Stocks used for allocation.

  • Methods enabled at each step.

  • Errors and warning produced each step with detailed message.

  • Comprehensive performance metrics such as volatility, return, and Sharpe ratio.

Second Execution (Caching):

  • When We re-rerun the pipeline for same set of input then Cached values were utilized for previously calculated steps, significantly reducing execution time.

  • Redundant calculations were bypassed, demonstrating the efficiency of the caching mechanism.

  • Outputs matched those of the first execution, validating the integrity and reliability of the system.

Reproducibility:

  • The workflow ensured consistent results for identical inputs across multiple runs. Logs and cached data validated the repeatability of the process.

  • Generated performance metrics, including volatility, return, and Sharpe ratio, were identical across executions, confirming the system's robustness. This validates the enhanced efficiency and optimization capabilities of the allocators.

Scalability and Efficiency:

  • The system successfully processed large datasets using the caching mechanism.

  • By reducing computational overhead during repeated runs, the workflow demonstrated scalability and adaptability for complex, large-scale portfolio optimization tasks.

Limitations

Many steps in the pipeline (e.g., expected return calculation, risk-return modeling, optimization methods) are computationally intensive. These tasks are currently executed sequentially, which limits processing efficiency. The lack of parallel or distributed computing capabilities hinders performance, particularly when working with large datasets or complex configurations.

Currently, the pipeline supports only stock data. Expanding the library to support other asset classes such as bonds, commodities, real estate, and cryptocurrencies would enhance its versatility, but is not yet implemented.

Ethics statements

Not applicable

CRediT authorship contribution statement

Rushikesh Nakhate: Conceptualization, Methodology, Software, Writing – original draft. Harikrishnan Ramachandran: Supervision, Project administration, Writing – review & editing. Amay Mahajan: Formal analysis, Writing – review & editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors would like to express their gratitude to all individuals who have contributed to the study. We appreciate the valuable input and support received from colleagues, mentors, and institutions during the course of this research

Footnotes

Related research article

None.

Data availability

Data will be made available on request.

References

  • 1.B. Gankhuu, Gordon growth model with vector autoregressive process, arXiv preprint arXiv:2406.19424 (2024). 10.48550/arXiv.2406.19424. [DOI]
  • 2.Martin R.A. Pyportfolioopt: portfolio optimization in python. J. Open Source Software. 2021;6(61):3066. doi: 10.21105/joss.03066. [DOI] [Google Scholar]
  • 3.D. Cajas, Riskfolio-lib (6.3.1) (2024). URL https://github.com/dcajasn/Riskfolio-Lib
  • 4.Elton E.J. Expected return, realized return and asset pricing tests. Ann. Oper. Res. 2024:1–19. doi: 10.1007/s10479-024-06246-4. [DOI] [Google Scholar]
  • 5.Sharifzadeh M., Hojat S. The impact of environmental(e), social(s), governance(g), and esg engagement on financial and risk-return performance: a quantitative case study of exchange-traded funds (etfs) European J. Bus. Manag. Res. 2024;9(5):167–176. [Google Scholar]
  • 6.Gharavian V., Ibrahim A., Rahnamayan S. Enhancing Portfolio Management: A Multi-Objective Optimization Approach Integrated with Macd Trading System. 2024. pp. 1–26. [DOI] [Google Scholar]
  • 7.Barollet T., Guillon C., Selva M., Broquedis F., Bouchez-Tichadou F., Rastello F. 2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) IEEE; 2024. Easytracker: a python library for controlling and inspecting program execution; pp. 359–372. [Google Scholar]
  • 8.Mallieswari R., Palanisamy V., Senthilnathan A.T., Gurumurthy S., Selvakumar J.J., Pachiyappan S. A stochastic method for optimizing portfolios using a combined monte carlo and markowitz model: approach on python. ECONOMICS. 2024 [Google Scholar]
  • 9.Kolari J.W., Liu W., Pynno¨nen S. Professional Investment Portfolio Management: Boosting Performance With Machine-Made Portfolios and Stock Market Evidence. Springer; 2024. Portfolio performance measures; pp. 97–119. [Google Scholar]
  • 10.Kelly C. Springer Nature; 2024. Computation and Simulation for Finance: An Introduction With Python. [Google Scholar]
  • 11.Lu M., Xu X. Trnn: an efficient time-series recurrent neural network for stock price prediction. Inf Sci (Ny) 2024;657 [Google Scholar]
  • 12.Elhayany M., Meinel C. The Learning Ideas Conference. Springer; 2024. An in-depth exploration of unit testing challenges in data-driven moocs; pp. 297–308. [Google Scholar]
  • 13.Sudiatmika I.P.G.A., Putra I.M.A.W., Artana W.W. The implementation of gated recurrent unit (gru) for gold price prediction using yahoo finance data: a case study and analysis. Brill. Res. Artificial Intell. 2024;4(1):176–184. [Google Scholar]
  • 14.Sharma A., Goyal A., Gangodkar D., Lohumi Y. 2024 International Conference on Electrical Electronics and Computing Technologies (ICEECT) Vol. 1. IEEE; 2024. Quantitative analysis for stocks and cryptocurrencies using python; pp. 1–7. [Google Scholar]
  • 15.Barroso P., Maio P. The risk-return trade-off among equity factors. J. Empirical Finance. 2024 [Google Scholar]
  • 16.Bini B., Mathew T. Clustering and regression techniques for stock prediction. Procedia Technol. 2016;24:1248–1255. [Google Scholar]
  • 17.Amaro G., Hendry L., Kingsman B. Competitive advantage, customisation and a new taxonomy for non make-to-stock companies. Int. J. Operat. Prod. Manag. 1999;19(4):349–371. [Google Scholar]
  • 18.P. Augar, The greed merchants: how the investment banks played the free market game, Penguin UK, 2006.
  • 19.Lee W. John Wiley & Sons; 2000. Theory and Methodology of Tactical Asset allocation, Vol. 65. [Google Scholar]
  • 20.Arora I. Improving performance of data science applications in python. Indian J. Sci. Technol. 2024;17(24):2499–2507. [Google Scholar]
  • 21.S. Khurshid, M.S. Abdulla, G. Ghatak, Optimizing sharpe ratio: risk-adjusted decision-making in multi-armed bandits, arXiv preprint arXiv:2406.06552 (2024).
  • 22.Yang S., He L. Asset allocation via life-cycle adjusted ppi strategy: evidence from the us and china stock market. Appl. Econ. 2024:1–16. [Google Scholar]
  • 23.Li J., Hou X. Object-fidelity remote sensing image compression with content-weighted bitrate allocation and patch-based local attention. IEEE Trans. Geosci. Remote Sensing. 2024 [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data will be made available on request.


Articles from MethodsX are provided here courtesy of Elsevier

RESOURCES