## Abstract

Groundwater is often one of the significant natural sources of freshwater supply, especially in arid and semi-arid regions, and is of paramount importance. This study provides a new and high accurate technique for forecasting groundwater level (GWL). The artificial intelligence (AI) models include the artificial neural network (ANN) of multi-layer perceptron (MLP) and radial basis function network (RBF), and adaptive neural-fuzzy inference system (ANFIS) models. Input data to the models is the monthly average GWL of 17 piezometers. In this study, a preprocessing of data including the discrete wavelet transform (DWT) and multi-discrete wavelet transform (M-DWT) simultaneously was utilized. The results showed that the hybrid M-DWT-ANN, M-DWT-RBF, and M-DWT-ANFIS models compared to the DWT-ANN, DWT-RBF, and DWT-ANFIS models as well as than regular ANN, RBF, and ANFIS models, had the highest accuracy in forecasting GWL for the 1-, 2-, 3-, and 6-months ahead. Also, the M-DWT-ANN model had the best performance. Overall, the results illustrated that using the M-DWT method as preprocessing of input data can be a valuable tool to increase the predictive model's accuracy and efficiency. The results of this study indicate the potential of M-DWT-AI hybrid models to improve GWL forecasting.

## HIGHLIGHTS

In this study for groundwater levels forecasting is proposed a new method based on multi-discrete wavelet transform.

The proposed technique improved the predictive accuracy of the models compared to the reference and single wavelet-based models.

Using ANN, RBF, and ANFIS models and hybrid them with multi-discrete wavelet transform.

There was a comparison between artificial intelligence models, and the M-DWT-ANN hybrid model performed best.

### Graphical Abstract

## NOMENCLATURE

- Symbol
Definition

- AI
Artificial Intelligence

- ANFIS
Adaptive Neural-Fuzzy Inference System

- ANN
Artificial Neural Network

- ARIMA
Autoregressive Integrated Moving Average

- ARIMAX
Autoregressive Integrated Moving Average with Explanatory Variable

- bior
Biorthogonal Wavelet

- BR
Bayesian Regularization

- coif
Coiflet Wavelet

- CWT
Continuous Wavelet Transform

- db
Daubechies Wavelet

- DWT
Discrete Wavelet Transform

- FEFLOW
Finite-Element Ground Water Flow

- FIS
Fuzzy Inference System

- fk
Fejer-Korovkin Wavelet

- FT
Fourier Transform

- GDX
Gradient Descent with momentum, adaptive learning rate

- GWL
Groundwater Level

- LM
Levenberg–Marquardt

- M-DWT
Multi-Discrete Wavelet Transform

- MF
Membership Functions

- MLP
Multi-Layer Perceptron

- MODFLOW
Modular Three-Dimensional Finite-Difference Groundwater Flow Model

- NSE
Nash–Sutcliffe Efficiency

- NRMSE
Normalized Root Mean Square Error

- R
Correlation Coefficient

- RBF
Radial Basis Function

- RMSE
Root Mean Square Error

- SCG
Scaled Conjugate Gradient

- SVM
Support Vector Machine

- SVR
Support Vector Regression

- sym
Symlet Wavelet

- WA
Wavelet Analysis

- WT
Wavelet Transform

## INTRODUCTION

Groundwater is often one of the significant natural resource freshwaters globally, particularly in arid and semi-arid regions (Li *et al.* 2013). In these areas, groundwater resources are essential because freshwater is scarce, the precipitation is very variable, and the evaporation rate is also high (Jolly *et al.* 2008). Groundwater use is increasing due to population growth, rapid economic development, demand for water resources, low cost, high quality, and broad access in arid and semi-arid areas (Sreekanth *et al.* 2011; Mirzavand *et al.* 2015). Also, sustainable management of groundwater resources in arid and semi-arid regions is very challenging because the stresses caused by human activities have made groundwater cycle and dynamic processes very complex and nonlinear (Sreekanth *et al.* 2011). The knowledge of GWL fluctuations and accurate GWL forecasting for water managers and engineers to adopt better strategies, reduce the destructive effects of climate change, prevent overuse of groundwater, and develop water resources is necessary (Jalalkamali *et al.* 2011). For GWL forecasting, there are two main approaches, namely the physical-based numerical model approach (e.g., MODFLOW, FEFLOW) and the data-driven model approach (e.g., artificial neural networks; ANN, autoregressive integrated moving average; ARIMA) (Adamowski & Chan 2011; Mohanty *et al.* 2013). The numerical methods and other physical methods with conceptualization help to understand the model from hydrological processes, but they have practical limitations. Most of these models are complex because they require several different and accurate inputs and a large amount of data in the model calibration and verification process (Jha & Sahoo 2015). In many regions, hydrogeological data are not easily accessible, and gathering information about the GWL is time-consuming and costly. Besides, conceptualizing a subsurface process and achieving its parameters inevitably involves uncertainties because most of these variables have complex and nonlinear behavior due to spatial and temporal changes. These variables' nonlinear behavior mainly causes uncertainty in forecasting results and makes it difficult to evaluate them accurately (Mustafa *et al.* 2012). Besides, achieving accurate forecasting is usually more important than comprehending the process and understanding the mechanisms that create it. So simple data-based models (such as AI models) can be a good alternative (Ebrahimi & Rajaee 2017).

Data-driven models require minimal data, and their run time is rapid. These models appear to be accurate and efficient in hydrological phenomena forecasting (Adamowski 2008). Recently, researchers have increasingly resorted to artificial intelligence (AI) to forecast nonlinear data such as the hydrological processes. This modeling approach is based on an AI endeavor to detect patterns and natural relationships in a complex system between GWL and various hydrological variables without creating a conceptual model, understanding the physical mechanism of the system. Meanwhile, the GWL time series contains sufficient and vital information about aquifers' dynamics, so the future of GWL fluctuations is forecastable from the past GWL fluctuations (Rajaee *et al.* 2019). Thus, in recent years, AI models including ANN, adaptive neural-fuzzy inference system (ANFIS), and support vector machine (SVM) models have been accepted as a useful and suitable used alternative tool for forecasting hydrological systems. These models have been used effectively and widely for GWL forecasting (e.g., Lallahem *et al.* 2005; Nayak *et al.* 2006; Mohanty *et al.* 2010; Sreekanth *et al.* 2011; Yoon *et al.* 2011; Shiri *et al.* 2013a; Tapoglou *et al.* 2014; Gong *et al.* 2016; Ghose *et al.* 2018; Abdel-Fattah *et al.* 2021). When that field data is insufficient and accurate forecasting and estimation are preferable to understand the system's process and mechanism, a data-driven (black-box) model can be a suitable alternative. Although essential features of AI methods are their ability to detect behaviors and patterns data in different time series, however, if these data have very non-stationary, the ANN, ANFIS models, ARIMA, ARIMAX, and other linear and nonlinear methods often may not handle with such time series if the preprocessing of inputs data is not performed (Cannas *et al.* 2006; Adamowski & Chan 2011).

Preprocessing input data leads to more effective training of AI models (such as ANN and ANFIS). Wavelet transform (WT) is one of the data preprocessors most momentous. This tool has been widely used for hydrological modeling in the decomposition of non-stationary time series and coupling with other methods (Labat *et al.* 2000; Adamowski 2008). WT can reveal and extract various features concealed in the physical structure of the data that other signal analysis methods might miss. Another feature of WT is the variety of the mother wavelet, selected according to the desired time series specifications. WT by analyzing the data and decomposition them at different resolution levels leads to the acquisition of useful information embedded in various time series. Ultimately, the use of this data leads to an increase in the ability of forecasting models (Daubechies 1990). Meantime, WT is an efficient mining tool in the non-stationary data analysis to forecast the AI models (Nourani *et al.* 2014). Wavelet has proven to be an efficient mathematical tool (Adamowski & Chan 2011).

Wavelet-AI hybrid models in recent years as a potentially helpful method of modeling the hydrological processes have been accepted and used in many different areas. These hybrid models in a wide range of hydrological processes, including rainfall-runoff, precipitation, water quality, GWL forecasting, etc., have been used (e.g., Kisi 2008; Nourani *et al.* 2009; Adamowski & Sun 2010; Shiri & Kisi 2010; Rajaee 2011; Shiri *et al.* 2013b; Barzegar *et al.* 2017; Ebrahimi & Rajaee 2017; Yu *et al.* 2018; Graf *et al.* 2019; Jeihouni *et al.* 2019; Tayyab *et al.* 2019; Zhang *et al.* 2019). These researchers have shown that wavelet-based coupled models have more accurate results than regular models, such as ANN, ANFIS, and radial basis function (RBF). The discrete wavelet transforms (DWTs) can be utilized by hybridizing with ANN, SVM, RBF, and ANFIS models to create a hybrid model entitled DWT/WA-ANN, DWT/WA-SVM, DWT/WA-RBF, and DWT/WA-ANFIS (Nourani *et al.* 2014). The preprocessing data before employing them as input for ANN, RBF, and ANFIS (or other data-driven models) can considerably improve the performance of these models (Moosavi *et al.* 2012). For example, Kisi (2009) examined the ANN model's use linking with wavelet analysis (WA-ANN) to the daily flow forecasting of the rivers and compared it with the results of single ANN models. His results showed that wavelet preprocessing could considerably increase ANN forecasting accuracy in forecasting daily intermittent streamflow. For the first time, Adamowski & Chan (2011) proposed the hybridizing idea of the wavelet-AI models in GWL forecasting for the 1-month ahead. They found that the GWL forecast of the WA-ANN model was higher precise than the ANN and ARIMA models. Their research suggests that preprocessing input data by WT improves the performance of the ANN model in forecasting. Moosavi *et al.* (2013) compared the ANN, ANFIS, WA-ANFIS, and WA-ANN models for 1-, 2-, 3-, and 4-months ahead GWL forecasting the results demonstrated that the WA-ANN and WA-ANFIS models performed better than ANN and ANFIS models. Suryanarayana *et al.* (2014) applied the WA-ANN and WA-SVR models for the forecast 1-month ahead of the GWL for three observation wells in India. They compared them with the ANN, SVR, and ARIMA models to reveal that the models preprocessed by WT are more accurate. Ebrahimi & Rajaee (2017) compared three WA-ANN, WA-MLR, and WA-SVR models in the 1-month ahead GWL simulation. The results indicated that the decomposing GWL time series by wavelet significantly improved the training of the models. The best model was also known as WA-ANN. Tayyab *et al.* (2019) examined rainfall-runoff modeling at the river basin in China by integrated neural networks with DWT (i.e., DWT-ANN and DWT-RBF). Their analysis validated that the ANN and RBF models integrated with DWT have better predictability than single models. Zhang *et al.* (2019) employed a data-driven ANN model hybrid with a wavelet (i.e., W-ANN) to forecast GWL in Zhoushan Island, China. The results showed that hybrid models had better performance in forecasting, especially in short-term periods. Band *et al.* (2021) used the DWT-SVR model to predict the GWL of the Semnan plain (arid region) located in Iran. They found that the subseries obtained from the WT as an input to the SVR model would reduce its forecast error for an ahead month.

In this study, the multi-discrete wavelet transforms (M-DWTs) new method for data preprocessing has been used to achieve more accurate forecasting of the AI models. Most data on natural phenomena contain different information and depend on various factors. In other words, this data is composed of several frequencies, and the same is true of the GWL data used. The purpose of applying the M-DWT technique for data preprocessing is that each DWT extracts various hidden aspects of time series, so using them simultaneously and together will improve the learning and performance of AI models. For this purpose, hybrid models M-DWT-ANN, M-DWT-RBF, and M-DWT-ANFIS have been developed. In addition to the above models, results of DWT-ANN, DWT-RBF, and DWT-ANFIS models and ANN, RBF, and ANFIS models have been presented and evaluated for the GWL forecasting. The present study investigates the monthly GWL forecast in the Chamchamal-Bisotun basin, Kermanshah, the west of Iran.

## MATERIALS AND METHODS

### Study area and data

As shown in Figure 1, the study aquifer of the Chamchamal-Biston, Kermanshah province, the west of Iran, is located. This region's area is 211 km^{2}, which is a semi-arid climate. This region is located in a geographical area with coordinates of 34 °20′N–34 °36′N latitudes and 47 °21′E–47 °37′E longitudes. The average thickness of the aquifer layer in the plain is about 109 m, and the average thickness of the aquifer saturation layer is 97 m. The range of elevation changes in the region is from 1,255 m above mean sea level (AMSL) in the middle to 1,457 m in the southwest. Gamasiab River flows in the east–west direction, and Dinavar River flows in the north–south direction, and near the middle part of the plain, these two rivers join. These rivers are considered part of the Karkheh River basin's tributaries. The average annual temperature in the period of 1991–2010 in the study area is about 27.2 °C, the average mean monthly maximum and minimum temperatures of the area are 38.57 and −3.93 °C in July and January, respectively. The average annual rainfall during this period is 473 mm. The range of changes is from 388 mm in the southern plain to 563 mm in the northern region. Precipitation mainly occurs between the November and April months. Most parts of the Chamchamal-Bisotun plain consist of alluvial deposits. The alluvial (porous) unconfined aquifer in this plain includes a complex distribution of gravel, sand, grit, silt, and clay is typical of the aquifers' depositional systems in many areas located in the plains of the Zagros Mountains. A Karst formation surrounds Chamchamal-Bisotun plain. The geological formation of the study area is young terraces and gravel fans (according to Figure 1). The primary groundwater supply source includes groundwater flows, precipitation, infiltration from the two major rivers, return flow of irrigation. Also, the main parameters of discharge in this region are the exploitation wells and evaporation (Mohammadi 2008). To estimate the average GWL of the studied aquifer, 17 piezometers were utilized, and the monthly observed GWL data is from 1991 to 2017 for 26 years and 9 months (April 1991–December 2017). Figure 2 shows the groundwater level of the Chamchamal-Biston aquifer for two different months. This data is from archive information of the regional water company of Kermanshah. For all piezometers at 1-month intervals, the sampling and recording of data have been done. The locations of piezometers in an aquifer expanse in the Chamchamal-Bisotun plain are shown in Figure 1. Although most prior studies (e.g., Mohanty *et al.* 2010; Adamowski & Chan 2011; Yoon *et al.* 2011; Suryanarayana *et al.* 2014) used one or more single observation wells to forecast GWL, this study employed average GWLs of the plain. According to the approach and goals of model development, how to use the data can be chosen. In this regard, to prevent a drop in GWL, the amount of extraction from the groundwater resources must be less than the amount of recharge (Kinzelbach *et al.* 2003).

*et al.*2017). The weighted GWL can calculate using Equation (1).where is the weighted average of GWLs, GWL

*is measurements by each piezometer, and*

_{i}*A*is the area of each polygon.

_{i}The long-term average GWL of the plain during 1991–2017 shows a decrease of about 4.11 m. Most of the agriculture in the study area depends on groundwater resources, and if the GWL decreases further, problems will increase.

In this study, input data to the AI models from 321 data monthly GWL measurement data were used (April 1991–December 2017) of the Chamchamal-Bisotun plain. In most studies, the data set was divided into two parts and used monthly, which can be enough in the modeling process (Nourani *et al.* 2015; Rajaee *et al.* 2019). All data used were divided into the training (calibration) and testing (verification) sets. The first 257 samples GWL (i.e., 80% of the total data set) to develop the model (training), and the residual 64 samples (i.e., 20% of the whole data set) to evaluate (testing) the models were used (e.g., Tapoglou *et al.* 2014; Chang *et al.* 2016). Meantime, 1-, 2-, 3-, and 6-months ahead as GWL forecast time steps were selected.

### Model performance criteria

*N*is the number of data set used,

*F*is the forecasted values (model outputs),

_{i}*O*is the observed data, and are the average values for

_{i}*F*and

_{i}*O*, respectively.

_{i}The R values range from 0 to 1. A value of 1 represents the model's accuracy and efficiency and a complete relationship between the forecasted and observed data. A value of 0 does not indicate any statistical correlation relationship between these data. To evaluate the predictive power of hydrological models of NSE are used range between –∞ to 1. The efficiency of one (NSE = 1) represents a complete accommodation of forecasted values to the observed values. In contrast, the value of zero (NSE = 0) represents that the model forecasted is as precise as the mean of the observed data set. The efficiency of less than zero (NSE <0) obtains when the average of the observed data is further than the forecasted values of the model. The RMSE measures the variance of the errors independently of the sample amount and represents the difference (called errors) between the observed data and forecasted values. A value of 0 would indicate a perfect fit for the data. The NRMSE criterion does not calculate the scale of the data and expresses the error as a percentage. Finally, a high value for NSE and R (maximum one) and the small RMSE and NRMSE (zero value) represent the model's high efficiency. The best fit between forecasted value and observed value is R = 1, NSE = 1, RMSE = 0, and NRMSE = 0%, respectively (Boyce *et al.* 2015; Nourani *et al.* 2015; Gong *et al.* 2016; Barzegar *et al.* 2017).

### Artificial neural network

*et al.*2010). An attractive feature of ANN is that it performs well, even when the data set contains noisy data and measurement errors, and even if the system behavior is complex and nonlinear. The superiority of using neural networks over other models is that it requires fewer input data, and their execution time is rapid. Another advantage of ANN is that, without previous knowledge and complete understanding of the actual physical process of the phenomenon, the network can be training, therefore finding and eliciting the exact relationship between the input data and the output data (Jalalkamali

*et al.*2011). The architecture of the ANN neural network is also known as the multi-layer perceptron network (MLP). This network includes an input layer, usually one or two hidden layers, and an output layer. Also, each layer can be composed of one or more artificial neurons (Kim & Valdes 2003). The MLP is one of the most widely used neural networks for hydrological modeling, and it can recognize latent and nonlinear patterns (Nayak

*et al.*2006). MLPs often are used in hydrological forecasting due to their simplicity. In the present study, using of back-propagation algorithm to train the MLP model was applied for GWL forecasting. The architecture of a typical MLP network with a hidden layer in which the logistic (sigmoid) activation function and a linear function in the output layer and that in Figure 3 are shown. The approach of feed-forward MLP to mathematical expression is as follows:where

*n*is the number of data set,

*m*is the number of neurons in the hidden layer;

*w*,

_{ij}*w*is the weight that neurons have in the input and output layers, respectively;

_{jk}*w*,

_{bj}*w*is the bias in the hidden and output layers, respectively;

_{bk}*f*,

_{h}*f*is the activation function of the neurons in the hidden and output layers, respectively;

_{o}*x*(

_{i}*t*),

*y*(

_{k}*t*) is the

*i*th input variables and the

*k*th output variables at time step

*t*, respectively (Kim & Valdes 2003).

For ANN model development, the designation of the structure and the model's training algorithm is critical. An algorithm that provides proper performance in the forecast is needed. Commonly used training algorithms include Levenberg-Marquardt (LM), scaled conjugate gradient (SCG), gradient descent with momentum, adaptive learning rate (GDX), and Bayesian regularization (BR) (Mohanty *et al.* 2010; Gong *et al.* 2016). The error rate between the target and forecasted output in these algorithms is back-propagation to the network. The weights linking the neurons in the learning phase through a training algorithm are updated. MLPs in the event can perform well in function approximation, which is sufficient neurons in the hidden layer of the network and also existing enough data is available (Cybenko 1989; Principe *et al.* 2000). Numerous studies have indicated that one or two hidden layers may be enough for the ANN model to approximate any intricate nonlinear process in forecasting and simulating hydrological systems (e.g., Belayneh *et al.* 2014; Nourani *et al.* 2015; Piotrowski *et al.* 2015). Hence, by applying another hidden layer in the network, the local and global properties are extracted by the fitting function in the first and second hidden layers, respectively (Funahashi 1989; Haykin 1999). In this study, the preliminary results indicated that one or two hidden layers are sufficient to approximate the relationship between observed and forecasted GWL. Also, the optimum number of neurons was determined in the hidden layers by the trial-and-error method. The main criteria for stopping the run were used based on cross-validation during the training of the neural networks, and they include the Mu (equal 1.00e^{+10}), Gradient (equal 1.00e^{−7}), and Maximum Iteration (equal 1,000 epoch) criteria.

### Radial basis function neural network

A RBF neural network is a species of a feed-forward neural network employed for function approximation, time-series forecasting, system control, classification, noisy interpolation, and regularization (Kegl *et al.* 2000). RBFs can detect patterns and simulate different time-series behaviors and replace conventional ANN (MLP) neural networks be used (Ahmed *et al.* 2015). The RBF has received more attention due to its simple architecture, good generalization, robust tolerance to noisy data, fast convergence, high reliability, and efficiency suitable computationally (Girosi & Poggio 1990). On the same basis, the RBF network has been employed increasingly in the field of hydrological forecasting (e.g., Govindaraju & Rao 2000; Chang & Chen 2003; Sahoo & Ray 2006; Chen *et al.* 2010; Yan & Ma 2016; Zhang *et al.* 2017; Tayyab *et al.* 2019). Similar to the MLP network, the RBF network structure consists of three layers including the input layer (X), the hidden layer (H), and the output layer (Y).

It is worth mentioning that the configurated RBF is considered just one hidden layer. The input layer contains the origin nodes that link the network to its external inputs. The hidden layer is converted the data from the input state to the hidden spaces and is used a nonlinear function by performing a simple process. Whereas that transform from the hidden layer to the output layer is linear, and this layer acts as a summation unit (Chen *et al.* 2010). Figure 3 illustrates the basic structure of a three-layered RBF network.

*et al.*2015). The hidden layer includes several simple processing nodes (neurons) and a parameters vector called the center, and each has an activation function that transforms the inbound signal (data). The activation function measures the Euclidean distance between the input vector and the central point of a hidden node in the network and finally enables the networks to learn (Chen

*et al.*2010). The computation process of the RBF network has the following form:where the input data

*X*is an

*N*-dimensional vector, it is the same as the dimensionality of neurons in the hidden layer,

*X*= [

*x*

_{1},

*x*

_{2}, …,

*x*]

_{n}*;*

^{T}*c*is the center vector of the

_{i}*i*th neuron in the hidden layer; || || denotes the Euclidean norm; ɸ

*() is the activation function; and*

_{i}*h*is the number of neurons in the hidden layer.

*β*is the center width. Equation (8) represents the closer between the input vector and the center of the activation function. The weighted sum of the incomings is transformed in the output layer to the network output with a linear activation function. Based on the network, output error is added to the bias term. The biases and weights by back-propagation of error are updated. The activity of the

*j*th node in the output layer,

*y*, can be calculated as follows (ASCE 2000):where ɸ

_{j}*(*

_{i}*X*) is the response of the

*i*th hidden node resulting from all input data;

*w*is the connecting weight between the

_{ji}*i*th node in the hidden layer and the

*j*th node in the output layer;

*w*

_{0}is the bias term.

### Adaptive neural-fuzzy inference system

The fuzzy set is a method of describing non-probabilistic uncertainties. Years later, the ANFIS model was introduced first by Jang (1993) as an efficient global approximation tool for vague and fuzzy systems, which combines an adaptive neural network with a fuzzy inference system (FIS). Therefore, ANFIS is organized to take advantage of FIS and ANN's best features, such as decreasing the optimization search space by the FIS and back-propagation to a network by the ANN (Jang *et al.* 1997). The FIS structure and mechanism can take the input data and specify their input membership functions (MFs). Then, it correlates input MFs to rules and the rules to a set of output specifications. Eventually, it transforms output specifications to output MFs and outputs MF to a crisp output (Jang *et al.* 1997). A FIS consists of five layers and nodes as follows (Jang 1993): the first layer whose task is to receive external signals and apply the if-then rules (input nodes), the second layer which defines the MFs of the used fuzzy sets (rule nodes), the third layer or decision layer (averages nodes), the fourth layer which is the fuzzification interface (consequent nodes), and the fifth layer convert the fuzzy results (defuzzification) to the model output (output nodes). The learning algorithms are employed in the ANFIS training process to determine the relationship between input and output. In the current study, the Sugeno method for GWL forecasting was used. On the other side, Sugeno-based ANFIS widely applied has been used in modeling and forecasting accurate system nonlinear such as hydrological processes in the literature (e.g., Vernieuwe *et al.* 2005; Mohammadi 2008; Kisi 2008; Shiri & Kisi 2010; Sreekanth *et al.* 2011; Moosavi *et al.* 2013; Zare & Koch 2018). In the present study, the first-order Sugeno fuzzy, a deterministic system of output equations, was used and is a valuable and efficient method for parameter estimation (Takagi & Sugeno 1985). The fuzzy structure of the ANFIS model with the Sugeno approach consists of five layers is shown in Figure 5 (Jang 1993). Figure 4 presents a FIS with two inputs *x*_{1}, and *x*_{2}, and one output *y*. The first-order Sugeno fuzzy approach consists of a typical rule set containing two fuzzy if-then rules, which can describe as follows (Shiri *et al.* 2013b):

Rule 1: if

*x*_{1}is*A*_{1}and*x*_{2}is*B*_{1}, then*y*_{1}=*p*_{1}*x*_{1}+*q*_{1}*x*_{2}+*r*_{1}Rule 2: if

*x*_{1}is*A*_{2}and*x*_{2}is*B*_{2}, then*y*_{2}=*p*_{2}*x*_{1}+*q*_{2}*x*_{2}+*r*_{2}

*A*

_{1},

*A*

_{2}, and

*B*

_{1},

*B*

_{2}are the MFs of inputs

*x*

_{1}and

*x*

_{2}, respectively.

*p*

_{1},

*q*

_{1},

*r*

_{1}and

*p*

_{2},

*q*

_{2},

*r*

_{2}is a linear combination of the adaptable consequent variables. The designation is done of appropriate values of these variables by the least-squares method. The IF section is fuzzy, while the THEN section is a crisp function of an antecedent variable. The ANFIS process can be expressed in the following steps, the structure of which includes five layers as follows:

*i*th node of layer 1. Membership grades have been generated based on the suitable fuzzy set by each node of the layers they belong to employing MFs. This layer provides the input values and MFs to the next layer.where

*x*

_{1}and

*x*

_{2}are the inputs to the

*i*th nodes and and are the outputs layer 1.

*A*and

_{i}*B*are fuzzy labels (excellent, average, and inappropriate) specified by the suitable MFs and that are associated with this node function. Meanwhile, the types and shapes of various MFs in this step can be used. Each MF contains a curve that indicates each point in a specified input partition. The choice of number and shape for MF has a significant effect on the computational complexity and performance of the ANFIS-based model. MFs can be considered triangular (MFtri), trapezoidal (MFtrap), Gaussian (MFgauss), bell-shaped (MFgbell) functions, or other shapes (Jang 1993). MFgauss and MFgbell operative generally describe the MFs for

_{i}*A*and

*B*. In the present study, Gaussian MFs are used and defined as follows:where

*c*is the center MF and

*σ*is the factor that determines the width of MF. In fact,

*c*and

*σ*are the parameters set (premise parameters) of MF in the premise part of the fuzzy if-then rules (Jang 1993).

*w*). According to Figure 5, this layer includes circle nodes labeled Π, which multiply input signals and transmits the product to the next layer, and can illustrate the outputs of this layer as:

_{ij}In which is the output of the second layer. In this layer, a firing strength of a rule by the output of each node is indicated and is computational by the fuzzy and connective of the product of the input signals.

*N*, a stable node, and plays a normalizing role in the network. The primary purpose of using this layer is to compute the ratio of each

*i*th rule's firing strength to the sum of all the rules firing strength. How to calculate the third layer outputs as follows:where is the output of the third layer. Each node in the third layer performs and computes the precondition conformity of the fuzzy rules.

*i*th node's output in the fourth layer. The node outputs feed-forward until layer 4, and the output MF parameters are reconnaissance by the least-squares procedure. In the back-propagation, the output MF parameters are kept constant, the error signals propagate backward, and the gradient descent approach corrects the input MF parameters.

Finally, an adaptive network is formed through previous processes, which is functionally equivalent to the first-order Sugeno FIS. The task of the learning algorithm in the ANFIS model is to use the input and output information units and determine the adjustable parameters, which is the overall aim of ANFIS. These algorithms imitate and simulate processes in a data set by determining MF variables (Jang 1993). The purpose of the learning algorithm is to tune altogether the amendable parameters, i.e., *σ _{i}*,

*c*,

_{i}*p*,

_{i}*q*, and

_{i}*r*, to determine the appropriate values of these variables so that the ANFIS output match with the target data. The ANFIS theory can be studied in detail in the relevant technical literature for further survey (e.g., Jang 1993; Jang & Sun 1995; Jang

_{i}*et al.*1997).

### Wavelet transform

The wavelet transform (WT) was employed in the early 1980s and has developed over in recent decades, has been proven to be an efficient tool in the analysis of non-stationary time series (e.g., Shiri & Kisi 2010; Ebrahimi & Rajaee 2017; Yu *et al.* 2018). WT is a momentous derivative of the Fourier transform (FT). WT's superiority over FT is that it is can simultaneously keep and gain information about the time, location, and frequency of a time series. In contrast, the FT only keeps and provides information on time-series frequency (Misiti *et al.* 1997). Numerous studies have represented that the data preprocessing employing the WT improves the performance of AI models such as ANN, RBF, and ANFIS for various forecasts, including precipitation (Belayneh *et al.* 2014), river flow (Adamowski & Sun 2010; Tayyab *et al.* 2019), river water temperature (Piotrowski *et al.* 2015; Graf *et al.* 2019), and GWL (Adamowski & Chan 2011; Zare & Koch 2018). In WT, the data were decomposed into various subsections at several resolution levels applying the wavelet function (called the mother wavelet). The decomposition level of the WT can have chosen in matching to time-series data periodicity (Moosavi *et al.* 2013). One type of WT is the continuous wavelet transform (CWT) classical, which generates too much data and can operate at every time scale. The CWT contains redundant information, making it more difficult for data analysis and requiring much computational time (Adamowski 2008).

*a*’ is a positive number and the parameter ‘

*b*’ is any real number.

*ψ*(

_{a,b}*t*) is the wavelet function;

*a*is the frequency or scale (or dilated) parameter; and

*b*is the translation or shifted parameter.

*x*(

*t*) is as follows (Percival & Walden 2006):where CWT(

*a*,

*b*) is the wavelet coefficient and ‘*’ is for a complex conjugate.

*a*and

*b*). Analyzing a signal employing all wavelet coefficients is computationally infeasible. Nevertheless, suppose the scale and translation are selected based on the powers-of-two (dyadic scales and translation). Still, the signal can be reconstructed from the corresponding wavelet coefficients sufficiently despite significantly decreasing the amount of data. As a result, the decomposed signals of this method are more efficient for analysis. This can be provided using the DWT and can be defined and calculated as follows (Mallat 1989):where

*m*and

*n*are integers that tune the scale (or dilation) and the transmission (or shifted), respectively,

*a*

_{0}is a certain fined scale step, and

*b*

_{0}is the location parameter whose values are bigger than one and zero, respectively. The most common select for parameters are

*a*

_{0}= 2 and

*b*

_{0}= 1 (Mallat 1989).

*a*’ and shift times ‘

*b*’ in the mother wavelet is considered power-of-two, i.e., scale

*a*= 2

*and location*

^{m}*b*= 2

*. The dyadic wavelet can be defined as more compact as follows (Mallat 1989):*

^{m}nWavelets are described by the wavelet function *ψ*(*t*) (called the mother wavelet) and scaling function *φ*(*t*) (i.e., the father wavelet) in the time domain. Also, the mother and father wavelets retain the characteristics of the frequency domain and time domain, respectively.

*x*(

*t*), which happens at a discrete time

*t*, DWT can be calculated as follows:

*x*(

*t*) in DWTs into several finite subsets decomposed, and due to DWT having multi-resolution, there will be no problem in reducing the resolution of time. Hence, the multi-resolution decomposition of the sampled signal removes the CWT redundant data while extracting adequately and optimally signal properties at a specific scale under certain conditions. In this way, is defined the wavelet function as a filter set with filter coefficients

*g*(

*n*). Yet, the scale can not be infinitely large. Because the signal scale and time are limited, an equation to solve the problem is used, described as similar to the wavelet function

*ψ*(

*t*) but from a father wavelet

*φ*(

*t*) taken. In the multi-resolution analysis, it must satisfy the two-scale characteristics and are defined as follows (Mallat 1989):where

*h*(

*n*) and

*g*(

*n*) are the filter coefficients and because of the orthonormality are related to each other (mirror filter):

*x*(

*t*) signal:where

*j*

_{max}is the maximum decomposition level.

*a*

_{j}_{max}is the signal approximation at the highest level of resolution (low frequency).

*d*contains the signal details of the low-scale (high frequency) and recursively for different levels at different maximum values (

_{j}*j*= 1, 2, …,

*j*

_{max}). The polycyclic decomposition of the

*x*(

*t*) signal at that level consisting of the approximations and details at that level, can be calculated (Zare & Koch 2018). As given in Equation (25), the original signal can be reconstructed according to the subsignals.

### Combining ANN, RBF, and ANFIS models with DWT and M-DWT

In the last decade, hybrid modeling methods have expanded significantly, and in particular, the interest of a wide range of researchers in wavelets-AI techniques for GWL modeling has increased (e.g., Moosavi *et al.* 2013; Ebrahimi & Rajaee 2017; Yu *et al.* 2018; Zare & Koch 2018; Zhang *et al.* 2019). These studies' results represent the superior performance of coupling models compared with single models in accurately forecasting GWL. The hybrid Wavelet-AI model is designed to achieve the modeling ability of nonlinear and noisy processes. The choice of a suitable mother wavelet has an essential and significant role in wavelet-AI modeling (Nourani *et al.* 2009). Time series of the hydrological phenomena have different characteristics due to the complexity and being affected by many parameters and can have long-term, short-term characteristics, and or various combinations. Hence, an appropriate mother wavelet can cover those processes to provide better forecasts with compact or broad support (Maheswaran & Khosa 2012). Therefore, it seems that a combination of different mother wavelets for decomposition GWL time series is more suitable and can have better coverage and more compatibility with different time series shapes. In the present study, to build the hybrid models of the DWT as a single and the M-DWT as several mother wavelets simultaneously from Daubechies, Symlet, Coiflet, Biorthogonal, and Fejer-Korovkin wavelets and respectively with db, sym, coif, bior, and fk symbols, have been used as data preprocessing. Then, the decomposed data by wavelet transform in several combinations have been imported to the AI models (such as ANN, RBF, and ANFIS) and as DWT-AI and M-DWT-AI.

### Methodological limitations

It is worth noting that the optimal choice of M-DWT is more time-consuming than DWT. On the other hand, the number of sub-time-series decomposed by the M-DWT technique is more than DWT and causes increases in the input to the models and the computation time. Therefore, if the number of time series samples considered is high, the processing and run time of the model will increase increasingly. This matter may impose limitations on the data processing in large amounts of data, usually the long daily and hourly data.

## RESULTS AND DISCUSSION

This study aims to apply different AI models, including ANFIS, ANN, and RBF models, to provide more accurate forecasting of GWL up to 6 months beyond data records. Often, there are different features in natural phenomenon data that are mostly multi-frequency. In the present study, using the M-DWT technique, this different feature was discovered and mined, resulting in better learning of AI models of processes resulting in a more accurate forecast of the event. For that purpose, the network architecture of the model and the respective input composition optimally were selected and used to forecast GWL fluctuations of 1-, 2-, 3-, and 6-months ahead. In the next step, the performance of ANN, RBF, and ANFIS models have been evaluated, and their hybrid with the DWT and M-DWT, in GWL forecasting.

It is worth noting that the AI models and their hybrids with DWTs in the MATLAB R2018 software have been coded. The optimal structure of AI network architecture, parameter tuning, decomposition level selection, and the type and number of wavelets in WTs were designated using a trial-and-error method.

Choosing a suitable architecture for AI models (i.e., ANN, RBF, and ANFIS) is a fundamental step in modeling because improper architecture can lead to under/over-fitting and under/over-computing more problems. In addition, in modeling data-driven models such as ANN, RBF, and ANFIS, special attention is addressed to the choice suitable of input that can improve the model efficiency in both calibrations (training) and verification (testing) steps. To obtain valid multi-time-step ahead GWL forecasts, be organized layers of a network that must in such a way that it can encompass wholly the information related to the target data. In Table 1, the effects of choosing different architectures can be understood and evaluated. As shown in Tables 1–3, several combinations of input data and different architectures for the DWT-AI and M-DWT-AI models as examples and highlighted optimal values of the AI structure and input data. Indeed, the bolded values of these tables show the most accurate models for forecasting groundwater levels. The various patterns and conduct of GWL at different regions from the plain can result in the average GWL fluctuations' heterogeneity. Thus, integrating such data into a data-driven model may cause create of noisy data and superfluous information into the model. These models are highly dependent on the quality and quantity of data used. Hence, such non-integrated data can result in poor modeling results. However, in this study, the average GWL measured was used from 17 piezometers recorded data with the long period-appropriate in Chamchamal-Bisotun plain. Satisfactory results were obtained in the GWL forecast 1-, 2-, 3-, and 6-months ahead.

Number of delays (month) . | Network structure . | Output variable . | RMSE (m) . | R . | ||
---|---|---|---|---|---|---|

Calibration . | Verification . | Calibration . | Verification . | |||

1 | 1-3-1 | GWL (t + 1) | 0.48395 | 0.78233 | 0.56000 | 0.85025 |

1, 2, and 3 | 3-3-1 | GWL (t + 1) | 0.38632 | 0.76993 | 0.97445 | 0.86408 |

1, 2, …, 12 | 12-3-1 | GWL (t + 1) | 0.32013 | 0.92912 | 0.98349 | 0.83927 |

1 | 1-6-1 | GWL (t + 1) | 0.47997 | 0.78618 | 0.96066 | 0.84922 |

1, 2, and 3 | 3-6-1 | GWL (t + 1) | 0.37560 | 0.78012 | 0.97586 | 0.86464 |

1, 2, …, 12 | 12-6-1 | GWL (t + 1) | 0.27457 | 1.12090 | 0.98788 | 0.79154 |

1 | 1-4-2-1 | GWL (t + 1) | 0.47932 | 0.77378 | 0.96077 | 0.85248 |

1, 2, and 3 | 3-4-2-1 | GWL (t + 1) | 0.35746 | 0.78409 | 0.97816 | 0.85428 |

1, 2, …, 12 | 12-4-2-1 | GWL (t + 1) | 0.30110 | 1.07980 | 0.98540 | 0.77982 |

1 | 1-2-5-1 | GWL (t + 1) | 0.47902 | 0.77140 | 0.96082 | 0.85465 |

1, 2, and 3 | 3-2-5-1 | GWL (t + 1) | 0.38062 | 0.75973 | 0.97520 | 0.86118 |

1, 2, …, 12 | 12-2-5-1 | GWL (t + 1) | 0.35988 | 0.64707 | 0.97908 | 0.90676 |

1 | 1-1-12-1 | GWL (t + 1) | 0.46883 | 0.86312 | 0.96251 | 0.83202 |

1, 2, and 3 | 3-1-12-1 | GWL (t + 1) | 0.44382 | 0.80380 | 0.96213 | 0.87225 |

1, 2, …, 12 | 12-1-12-1 | GWL ( t + 1) | 0.40566 | 0.60521 | 0.97334 | 0.90953 |

1 | 1-10-5-1 | GWL (t + 1) | 0.47934 | 0.77423 | 0.96077 | 0.85228 |

1, 2, and 3 | 3-10-5-1 | GWL (t + 1) | 0.34769 | 0.76858 | 0.97935 | 0.86604 |

1, 2, …, 12 | 12-10-5-1 | GWL (t + 1) | 0.07523 | 1.95660 | 0.99910 | 0.43187 |

Number of delays (month) . | Network structure . | Output variable . | RMSE (m) . | R . | ||
---|---|---|---|---|---|---|

Calibration . | Verification . | Calibration . | Verification . | |||

1 | 1-3-1 | GWL (t + 1) | 0.48395 | 0.78233 | 0.56000 | 0.85025 |

1, 2, and 3 | 3-3-1 | GWL (t + 1) | 0.38632 | 0.76993 | 0.97445 | 0.86408 |

1, 2, …, 12 | 12-3-1 | GWL (t + 1) | 0.32013 | 0.92912 | 0.98349 | 0.83927 |

1 | 1-6-1 | GWL (t + 1) | 0.47997 | 0.78618 | 0.96066 | 0.84922 |

1, 2, and 3 | 3-6-1 | GWL (t + 1) | 0.37560 | 0.78012 | 0.97586 | 0.86464 |

1, 2, …, 12 | 12-6-1 | GWL (t + 1) | 0.27457 | 1.12090 | 0.98788 | 0.79154 |

1 | 1-4-2-1 | GWL (t + 1) | 0.47932 | 0.77378 | 0.96077 | 0.85248 |

1, 2, and 3 | 3-4-2-1 | GWL (t + 1) | 0.35746 | 0.78409 | 0.97816 | 0.85428 |

1, 2, …, 12 | 12-4-2-1 | GWL (t + 1) | 0.30110 | 1.07980 | 0.98540 | 0.77982 |

1 | 1-2-5-1 | GWL (t + 1) | 0.47902 | 0.77140 | 0.96082 | 0.85465 |

1, 2, and 3 | 3-2-5-1 | GWL (t + 1) | 0.38062 | 0.75973 | 0.97520 | 0.86118 |

1, 2, …, 12 | 12-2-5-1 | GWL (t + 1) | 0.35988 | 0.64707 | 0.97908 | 0.90676 |

1 | 1-1-12-1 | GWL (t + 1) | 0.46883 | 0.86312 | 0.96251 | 0.83202 |

1, 2, and 3 | 3-1-12-1 | GWL (t + 1) | 0.44382 | 0.80380 | 0.96213 | 0.87225 |

1, 2, …, 12 | 12-1-12-1 | GWL ( t + 1) | 0.40566 | 0.60521 | 0.97334 | 0.90953 |

1 | 1-10-5-1 | GWL (t + 1) | 0.47934 | 0.77423 | 0.96077 | 0.85228 |

1, 2, and 3 | 3-10-5-1 | GWL (t + 1) | 0.34769 | 0.76858 | 0.97935 | 0.86604 |

1, 2, …, 12 | 12-10-5-1 | GWL (t + 1) | 0.07523 | 1.95660 | 0.99910 | 0.43187 |

On the other side, most of the sampled and observed data hydrologically, such as GWL, have noise and redundant information. Therefore, decomposing or eliminating data noise is another fundamental step in modeling hydrological processes. The DWT-based method as a method for decomposition and noise reduction can improve the performance of models if it has an appropriate mother wavelet and trustworthy decomposition level.

It is noteworthy that the chosen type of the mother wavelet can have a significant impact on network learning and model output. The importance of choosing and combining the suitable mother wavelet increases in using several wavelets simultaneously. Accordingly, in this study, to get rational outcomes applying the trial-and-error method, the type of mother wavelet select appropriate decomposition level was determined, and their effect on the models' efficiency was examined and compared. For example, the results based on the values of RMSE in Table 2 showed that the db45 wavelet with a decomposition level of 2 could result in better performance than other wavelets and decomposition levels. The subseries elements that resulted from the application of DWT on the original time series data were processed by high-pass and low-pass filters and used as input to AI models. For example, Figure 7 represents the approximations and details originating from the GWL time series decomposed by the db8 mother wavelet at level 2. The GWL time series of the decomposed corresponding to each input combination was used as input to the models to forecast GWL. As shown in Figure 8, the use of multiple wavelets simultaneously with different filter lengths can contain various parts of the signal (GWL time-series data) that further help understand and learn AI-based data models.

Mother wavelets type . | ANN . | RBF . | ANFIS . | |||
---|---|---|---|---|---|---|

Calibration . | Verification . | Calibration . | Verification . | Calibration . | Verification . | |

-------- | 0.40566 | 0.60521 | 0.39877 | 0.62926 | 0.39636 | 0.62515 |

sym3 | 0.10939 | 0.22535 | 0.10123 | 0.26083 | 0.09005 | 0.21966 |

sym8 | 0.04097 | 0.08063 | 0.03662 | 0.10250 | 0.03897 | 0.07963 |

sym30 | 0.01170 | 0.01949 | 0.01769 | 0.03968 | 0.01654 | 0.02764 |

db4 | 0.09551 | 0.16892 | 0.08283 | 0.22836 | 0.07889 | 0.18760 |

db8 | 0.03678 | 0.08657 | 0.03432 | 0.09534 | 0.03533 | 0.08345 |

db45 | 0.00812 | 0.01722 | 0.01462 | 0.03454 | 0.01415 | 0.02859 |

db45 and sym10 | 0.00569 | 0.01621 | 0.00975 | 0.03588 | 0.00852 | 0.08445 |

db8 and sym4 | 0.00948 | 0.02414 | 0.00866 | 0.03627 | 0.00869 | 0.03343 |

db45 and coif2 | 0.00604 | 0.01773 | 0.00992 | 0.03902 | 0.01118 | 0.02556 |

db45 and sym10 and coif2 | 0.00108 | 0.00362 | 0.00281 | 0.01923 | 0.00384 | 0.01849 |

db8 and sym4 and coif5 | 0.00183 | 0.00435 | 0.00251 | 0.01457 | 0.00312 | 0.01094 |

db45 and sym4 and coif2 and bior5.5 | 0.00024 | 0.00107 | 0.00118 | 0.00807 | 0.00219 | 0.01194 |

db45 and sym4 and coif5 and db4 | 0.00057 | 0.00257 | 0.00256 | 0.01908 | 0.00223 | 0.01265 |

db45 and sym4 and coif5 and fk6 | 0.00087 | 0.00279 | 0.00339 | 0.02468 | 0.00212 | 0.00867 |

db25 and sym10 and coif2 and bior5.5 and fk6 | 0.00009 | 0.00061 | 0.00069 | 0.00803 | 0.00154 | 0.10257 |

db45 and sym10 and coif2 and bior5.5 and fk6 | 0.00008 | 0.00075 | 0.00088 | 0.01468 | 0.00168 | 0.01051 |

db4 and sym4 and coif5 and bior6.8 and fk22 | 0.00012 | 0.00108 | 0.00075 | 0.01056 | 0.00150 | 0.00772 |

db8 and sym4 and coif5 and bior5.5 and fk6 | 0.00001 | 0.00011 | 0.00098 | 0.01178 | 0.00168 | 0.00891 |

db8 and sym5 and coif5 and bior6.8 and fk6 | 0.00008 | 0.00069 | 0.00071 | 0.00679 | 0.00151 | 0.01058 |

db8 and sym4 and coif5 and bior6.8 and fk6 | 0.00002 | 0.00021 | 0.00058 | 0.00833 | 0.00135 | 0.10724 |

db8 and sym4 and coif4 and bior6.8 and fk6 | 0.00019 | 0.00128 | 0.00075 | 0.00861 | 0.00145 | 0.00827 |

db8 and sym4 and coif5 and bior6.8 and fk6 and db4 | 0.00008 | 0.00087 | 0.00049 | 0.01159 | 0.00108 | 0.11207 |

Mother wavelets type . | ANN . | RBF . | ANFIS . | |||
---|---|---|---|---|---|---|

Calibration . | Verification . | Calibration . | Verification . | Calibration . | Verification . | |

-------- | 0.40566 | 0.60521 | 0.39877 | 0.62926 | 0.39636 | 0.62515 |

sym3 | 0.10939 | 0.22535 | 0.10123 | 0.26083 | 0.09005 | 0.21966 |

sym8 | 0.04097 | 0.08063 | 0.03662 | 0.10250 | 0.03897 | 0.07963 |

sym30 | 0.01170 | 0.01949 | 0.01769 | 0.03968 | 0.01654 | 0.02764 |

db4 | 0.09551 | 0.16892 | 0.08283 | 0.22836 | 0.07889 | 0.18760 |

db8 | 0.03678 | 0.08657 | 0.03432 | 0.09534 | 0.03533 | 0.08345 |

db45 | 0.00812 | 0.01722 | 0.01462 | 0.03454 | 0.01415 | 0.02859 |

db45 and sym10 | 0.00569 | 0.01621 | 0.00975 | 0.03588 | 0.00852 | 0.08445 |

db8 and sym4 | 0.00948 | 0.02414 | 0.00866 | 0.03627 | 0.00869 | 0.03343 |

db45 and coif2 | 0.00604 | 0.01773 | 0.00992 | 0.03902 | 0.01118 | 0.02556 |

db45 and sym10 and coif2 | 0.00108 | 0.00362 | 0.00281 | 0.01923 | 0.00384 | 0.01849 |

db8 and sym4 and coif5 | 0.00183 | 0.00435 | 0.00251 | 0.01457 | 0.00312 | 0.01094 |

db45 and sym4 and coif2 and bior5.5 | 0.00024 | 0.00107 | 0.00118 | 0.00807 | 0.00219 | 0.01194 |

db45 and sym4 and coif5 and db4 | 0.00057 | 0.00257 | 0.00256 | 0.01908 | 0.00223 | 0.01265 |

db45 and sym4 and coif5 and fk6 | 0.00087 | 0.00279 | 0.00339 | 0.02468 | 0.00212 | 0.00867 |

db25 and sym10 and coif2 and bior5.5 and fk6 | 0.00009 | 0.00061 | 0.00069 | 0.00803 | 0.00154 | 0.10257 |

db45 and sym10 and coif2 and bior5.5 and fk6 | 0.00008 | 0.00075 | 0.00088 | 0.01468 | 0.00168 | 0.01051 |

db4 and sym4 and coif5 and bior6.8 and fk22 | 0.00012 | 0.00108 | 0.00075 | 0.01056 | 0.00150 | 0.00772 |

db8 and sym4 and coif5 and bior5.5 and fk6 | 0.00001 | 0.00011 | 0.00098 | 0.01178 | 0.00168 | 0.00891 |

db8 and sym5 and coif5 and bior6.8 and fk6 | 0.00008 | 0.00069 | 0.00071 | 0.00679 | 0.00151 | 0.01058 |

db8 and sym4 and coif5 and bior6.8 and fk6 | 0.00002 | 0.00021 | 0.00058 | 0.00833 | 0.00135 | 0.10724 |

db8 and sym4 and coif4 and bior6.8 and fk6 | 0.00019 | 0.00128 | 0.00075 | 0.00861 | 0.00145 | 0.00827 |

db8 and sym4 and coif5 and bior6.8 and fk6 and db4 | 0.00008 | 0.00087 | 0.00049 | 0.01159 | 0.00108 | 0.11207 |

Numerous studies have shown the proper functioning of different AI models in forecasting time series and GWL forecasts (e.g., Shiri *et al.* 2013a; Gong *et al.* 2016). The results of this study are consistent with other researchers' findings regarding the performance of AI models. In recent years, in many studies, DWT has been used to preprocess data in hybrid with AI models to improve the performance of these models. Researchers have acknowledged the improved performance of DWT-AI hybrid models over single AI models in predicting hydrological problems (e.g., Ebrahimi & Rajaee 2017; Band *et al.* 2021). In the long term, the accuracy of the models decreases. The results of the present study are consistent with the findings of the above studies regarding the performance of DWT-AI coupled models.

In the present study, was the M-DWT-AI model developed using the new M-DWT technique and hybrid with AI models. The results show a significant improvement in all M-DWT-ANN, M-DWT-RBF, and M-DWT-ANFIS models based on performance criteria (i.e., RMSE, NRMSE, NSE, and R criteria). For example, the results indicate that the hybrid M-DWT-RBF model performs better than the RBF and single DWT-RBF models, as clearly shown in Figures 11–13. Overall, the results are presented and evaluated in the following subsections.

### Results of ANN models and their hybrid with DWT and M-DWT

As mentioned in the prior section, the data sets in this study were divided into two subsets. This data subset includes the training (calibration) and testing (verification) sets, which are used to the learning and evaluate the performance of the model by performance criteria (i.e., RMSE, NSE, and R). Meantime, a test error is more visual than a training error because unknown values are used for the model to evaluate it (Tapoglou *et al.* 2014). Different algorithms are used to train the ANN. In the present study, various training algorithms, including LM, BR, SCG, and GDX, have been used to train the ANN models. Among these, the best BR back-propagation algorithm was selected, according to performance criteria such as R and RMSE. Finally, the BR algorithm was used to train the ANNs and developed models, one of the most widely used training algorithms (e.g., Daliakopoulos *et al.* 2005; Mohanty *et al.* 2010; Moosavi *et al.* 2013).

It should be noted that diverse input combinations were used, and for each of these inputs, the ANN structure was considered a three- and four-layer ANN, including an input layer, one or two hidden layers, and an output layer. Also, 1 to 20 neurons were used per layer to achieve the optimal network. Table 1 represents the R and RMSE (*m*) for several different ANN architectures as an instance and shows the best input combination, i.e., a combination with a delay of 1–12 months, and the best network structure (i.e., 12-1-12-1). The results obtained from these delays may be due to considering the seasonality of the GWL fluctuation process. ANN models use past GWL as input to forecast current GWL. The optimal number of neurons in the hidden layers, i.e., 1–12 neurons, was used for forecasts at different time steps, and the model with the least RMSE in the verification stage as the optimal network was selected (for example, RMSE = 0.605 and R = 0.909). From the point of view of least error in the verification stage, it concluded that the best option (i.e., 12-1-12-1) for a combination set of inputs and networks is to forecast the GWL 1-month ahead. After determining the most suitable ANN structure, the calibration is completed based on the performance criteria. The weights are stored in the generated grid in the verification (test) stage to be applied. It is worth mentioning that some network structures illustrated higher performance in the calibration stage but the model performance in the verification stage according to R and RMSE values was not satisfactory (see Table 1). This outcome can be related to the reality, in that the model by over-fitting to the target data in the calibration stage was trapped. Table 3 presents the R, RMSE, NRMSE, and NSE values for the optimal ANN (MLP) models during the calibration (training) and verification (testing) periods and at different time steps. Figure 9 shows the results for the best ANN structure and the best M-DWT-ANN structure to forecast monthly average GWL for 1-, 2-, 3-, and 6-months ahead. The R and NSE values of the ANN model in the verification stage for the 1-, 2-, and 3-months ahead are more than 0.84 and 0.69, respectively. Also, the RMSE and NRMSE values of the ANN model for these time steps are less than 0.81 and 12, respectively (Table 3).

forecasted . | Model type . | R . | RMSE (m) . | NRMSE (%) . | NSE . | ||||
---|---|---|---|---|---|---|---|---|---|

Calibration (Train) . | Verification (Test) . | Calibration (Train) . | Verification (Test) . | Calibration (Train) . | Verification (Test) . | Calibration (Train) . | Verification (Test) . | ||

GWL (t + 1) | ANN | 0.97 | 0.91 | 0.40566 | 0.60521 | 5.28080 | 8.93830 | 0.95 | 0.83 |

DWT-ANN | 0.99 | 0.99 | 0.00812 | 0.01722 | 0.10694 | 0.25426 | 0.99 | 0.99 | |

M-DWT-ANN | 0.99 | 0.99 | 0.00001 | 0.00011 | 0.00013 | 0.00159 | 0.99 | 0.99 | |

RBF | 0.97 | 0.91 | 0.39877 | 0.62926 | 5.24900 | 9.29340 | 0.95 | 0.81 | |

DWT-RBF | 0.99 | 0.99 | 0.01462 | 0.03455 | 0.19243 | 0.51017 | 0.99 | 0.99 | |

M-DWT-RBF | 0.99 | 0.99 | 0.00071 | 0.00679 | 0.00934 | 0.10028 | 0.99 | 0.99 | |

ANFIS | 0.97 | 0.90 | 0.39636 | 0.62515 | 5.21730 | 9.23260 | 0.95 | 0.82 | |

DWT-ANFIS | 0.99 | 0.99 | 0.01415 | 0.02859 | 0.18624 | 0.42235 | 0.99 | 0.99 | |

M-DWT-ANFIS | 0.99 | 0.99 | 0.00150 | 0.00772 | 0.01975 | 0.11400 | 0.99 | 0.99 | |

GWL (t + 2) | ANN | 0.95 | 0.87 | 0.55600 | 0.71500 | 7.20680 | 10.5705 | 0.90 | 0.76 |

DWT-ANN | 0.99 | 0.99 | 0.04340 | 0.08214 | 0.57125 | 1.21310 | 0.99 | 0.99 | |

M-DWT-ANN | 0.99 | 0.99 | 0.00010 | 0.00061 | 0.00131 | 0.00898 | 0.99 | 0.99 | |

RBF | 0.95 | 0.89 | 0.54417 | 0.75012 | 7.16290 | 11.0784 | 0.90 | 0.73 | |

DWT-RBF | 0.99 | 0.99 | 0.06280 | 0.14092 | 0.82649 | 2.08130 | 0.99 | 0.99 | |

M-DWT-RBF | 0.99 | 0.99 | 0.00327 | 0.03461 | 0.04307 | 0.51109 | 0.99 | 0.99 | |

ANFIS | 0.95 | 0.87 | 0.53949 | 0.70568 | 7.10130 | 10.4221 | 0.91 | 0.76 | |

DWT-ANFIS | 0.99 | 0.99 | 0.06141 | 0.12637 | 0.80836 | 1.86640 | 0.99 | 0.99 | |

M-DWT-ANFIS | 0.99 | 0.99 | 0.00597 | 0.03065 | 0.07861 | 0.45270 | 0.99 | 0.99 | |

GWL (t + 3) | ANN | 0.93 | 0.84 | 0.65914 | 0.80756 | 8.67630 | 11.9268 | 0.86 | 0.69 |

DWT-ANN | 0.99 | 0.99 | 0.10000 | 0.19991 | 1.31270 | 2.95250 | 0.99 | 0.98 | |

M-DWT-ANN | 0.99 | 0.99 | 0.00010 | 0.00065 | 0.00129 | 0.00966 | 0.99 | 0.99 | |

RBF | 0.93 | 0.79 | 0.62637 | 1.06270 | 8.24500 | 15.6955 | 0.87 | 0.47 | |

DWT-RBF | 0.99 | 0.98 | 0.12670 | 0.28138 | 1.66750 | 4.15570 | 0.99 | 0.96 | |

M-DWT-RBF | 0.99 | 0.99 | 0.00063 | 0.07141 | 0.00828 | 1.05460 | 0.99 | 0.99 | |

ANFIS | 0.93 | 0.85 | 0.63464 | 0.78369 | 8.35380 | 11.5743 | 0.87 | 0.71 | |

DWT-ANFIS | 0.99 | 0.98 | 0.12095 | 0.24890 | 1.59210 | 3.67580 | 0.99 | 0.97 | |

M-DWT-ANFIS | 0.99 | 0.99 | 0.01608 | 0.08416 | 0.21167 | 1.24300 | 0.99 | 0.99 | |

GWL (t + 6) | ANN | 0.88 | 0.70 | 0.81986 | 1.07080 | 10.7880 | 15.8100 | 0.78 | 0.46 |

DWT-ANN | 0.99 | 0.97 | 0.16347 | 0.31678 | 2.12160 | 4.67850 | 0.99 | 0.95 | |

M-DWT-ANN | 0.99 | 0.99 | 0.12393 | 0.31525 | 1.62320 | 4.65580 | 0.99 | 0.95 | |

RBF | 0.88 | 0.69 | 0.81918 | 1.30790 | 10.7830 | 19.3164 | 0.78 | 0.19 | |

DWT-RBF | 0.98 | 0.89 | 0.29646 | 0.95704 | 3.90230 | 14.1344 | 0.97 | 0.57 | |

M-DWT-RBF | 0.99 | 0.95 | 0.06137 | 0.49937 | 0.80652 | 7.37510 | 0.99 | 0.88 | |

ANFIS | 0.88 | 0.77 | 0.82903 | 1.00920 | 10.9126 | 14.8728 | 0.78 | 0.52 | |

DWT-ANFIS | 0.98 | 0.93 | 0.32372 | 0.54220 | 4.26110 | 8.03760 | 0.96 | 0.86 | |

M-DWT-ANFIS | 0.99 | 0.97 | 0.08227 | 0.34482 | 1.08300 | 5.09260 | 0.99 | 0.94 |

forecasted . | Model type . | R . | RMSE (m) . | NRMSE (%) . | NSE . | ||||
---|---|---|---|---|---|---|---|---|---|

Calibration (Train) . | Verification (Test) . | Calibration (Train) . | Verification (Test) . | Calibration (Train) . | Verification (Test) . | Calibration (Train) . | Verification (Test) . | ||

GWL (t + 1) | ANN | 0.97 | 0.91 | 0.40566 | 0.60521 | 5.28080 | 8.93830 | 0.95 | 0.83 |

DWT-ANN | 0.99 | 0.99 | 0.00812 | 0.01722 | 0.10694 | 0.25426 | 0.99 | 0.99 | |

M-DWT-ANN | 0.99 | 0.99 | 0.00001 | 0.00011 | 0.00013 | 0.00159 | 0.99 | 0.99 | |

RBF | 0.97 | 0.91 | 0.39877 | 0.62926 | 5.24900 | 9.29340 | 0.95 | 0.81 | |

DWT-RBF | 0.99 | 0.99 | 0.01462 | 0.03455 | 0.19243 | 0.51017 | 0.99 | 0.99 | |

M-DWT-RBF | 0.99 | 0.99 | 0.00071 | 0.00679 | 0.00934 | 0.10028 | 0.99 | 0.99 | |

ANFIS | 0.97 | 0.90 | 0.39636 | 0.62515 | 5.21730 | 9.23260 | 0.95 | 0.82 | |

DWT-ANFIS | 0.99 | 0.99 | 0.01415 | 0.02859 | 0.18624 | 0.42235 | 0.99 | 0.99 | |

M-DWT-ANFIS | 0.99 | 0.99 | 0.00150 | 0.00772 | 0.01975 | 0.11400 | 0.99 | 0.99 | |

GWL (t + 2) | ANN | 0.95 | 0.87 | 0.55600 | 0.71500 | 7.20680 | 10.5705 | 0.90 | 0.76 |

DWT-ANN | 0.99 | 0.99 | 0.04340 | 0.08214 | 0.57125 | 1.21310 | 0.99 | 0.99 | |

M-DWT-ANN | 0.99 | 0.99 | 0.00010 | 0.00061 | 0.00131 | 0.00898 | 0.99 | 0.99 | |

RBF | 0.95 | 0.89 | 0.54417 | 0.75012 | 7.16290 | 11.0784 | 0.90 | 0.73 | |

DWT-RBF | 0.99 | 0.99 | 0.06280 | 0.14092 | 0.82649 | 2.08130 | 0.99 | 0.99 | |

M-DWT-RBF | 0.99 | 0.99 | 0.00327 | 0.03461 | 0.04307 | 0.51109 | 0.99 | 0.99 | |

ANFIS | 0.95 | 0.87 | 0.53949 | 0.70568 | 7.10130 | 10.4221 | 0.91 | 0.76 | |

DWT-ANFIS | 0.99 | 0.99 | 0.06141 | 0.12637 | 0.80836 | 1.86640 | 0.99 | 0.99 | |

M-DWT-ANFIS | 0.99 | 0.99 | 0.00597 | 0.03065 | 0.07861 | 0.45270 | 0.99 | 0.99 | |

GWL (t + 3) | ANN | 0.93 | 0.84 | 0.65914 | 0.80756 | 8.67630 | 11.9268 | 0.86 | 0.69 |

DWT-ANN | 0.99 | 0.99 | 0.10000 | 0.19991 | 1.31270 | 2.95250 | 0.99 | 0.98 | |

M-DWT-ANN | 0.99 | 0.99 | 0.00010 | 0.00065 | 0.00129 | 0.00966 | 0.99 | 0.99 | |

RBF | 0.93 | 0.79 | 0.62637 | 1.06270 | 8.24500 | 15.6955 | 0.87 | 0.47 | |

DWT-RBF | 0.99 | 0.98 | 0.12670 | 0.28138 | 1.66750 | 4.15570 | 0.99 | 0.96 | |

M-DWT-RBF | 0.99 | 0.99 | 0.00063 | 0.07141 | 0.00828 | 1.05460 | 0.99 | 0.99 | |

ANFIS | 0.93 | 0.85 | 0.63464 | 0.78369 | 8.35380 | 11.5743 | 0.87 | 0.71 | |

DWT-ANFIS | 0.99 | 0.98 | 0.12095 | 0.24890 | 1.59210 | 3.67580 | 0.99 | 0.97 | |

M-DWT-ANFIS | 0.99 | 0.99 | 0.01608 | 0.08416 | 0.21167 | 1.24300 | 0.99 | 0.99 | |

GWL (t + 6) | ANN | 0.88 | 0.70 | 0.81986 | 1.07080 | 10.7880 | 15.8100 | 0.78 | 0.46 |

DWT-ANN | 0.99 | 0.97 | 0.16347 | 0.31678 | 2.12160 | 4.67850 | 0.99 | 0.95 | |

M-DWT-ANN | 0.99 | 0.99 | 0.12393 | 0.31525 | 1.62320 | 4.65580 | 0.99 | 0.95 | |

RBF | 0.88 | 0.69 | 0.81918 | 1.30790 | 10.7830 | 19.3164 | 0.78 | 0.19 | |

DWT-RBF | 0.98 | 0.89 | 0.29646 | 0.95704 | 3.90230 | 14.1344 | 0.97 | 0.57 | |

M-DWT-RBF | 0.99 | 0.95 | 0.06137 | 0.49937 | 0.80652 | 7.37510 | 0.99 | 0.88 | |

ANFIS | 0.88 | 0.77 | 0.82903 | 1.00920 | 10.9126 | 14.8728 | 0.78 | 0.52 | |

DWT-ANFIS | 0.98 | 0.93 | 0.32372 | 0.54220 | 4.26110 | 8.03760 | 0.96 | 0.86 | |

M-DWT-ANFIS | 0.99 | 0.97 | 0.08227 | 0.34482 | 1.08300 | 5.09260 | 0.99 | 0.94 |

The DWT-ANN and M-DWT-ANN hybrid models are based on the hybrid of the ANN model and wavelet decomposition and were used to minimize forecasting errors. The past GWL data by the wavelet decomposition preprocessed, and then appropriate sub-time series as input has been used for GWL forecasting. Figures 9 and 10 show the scatter and time series for comparing the observed and calculated GWLs. The R, RMSE, NRMSE, and NSE values of the DWT-ANN and M-DWT-ANN models are represented in Table 3 to forecast the time steps. Results in the verification stage for the 1-, 2-, and 3-months ahead the NSE and R values are close to 1, and the RMSE and NRMSE values are less than 0.2 and 3 for the DWT-ANN model, respectively, and are less than 0.0007 and 0.01 are for the M-DWT-ANN model, respectively. All models forecasted GWLs for 1-, 2-, and 3-months ahead correctly, but for 6-months ahead, the R and NSE decreased, and RMSE and NRMSE increased significantly in the study area (see Table 3). Nevertheless, the forecasting results at 1-, 2-, and 3-months ahead time scale using the M-DWT-ANN model are beyond acceptable for preprocessed models with DWT. Comparing the results of the three models, the M-DWT-ANN model performs better than the ANN and DWT-ANN models from R, RMSE, NRMSE, and NSE. The best models (ANN, DWT-ANN, and M-DWT-ANN models) were used to forecast GWL. Figures 9 and 10 clearly show the proficiency of the wavelet-neural network model to learn the nonlinear relationship between input and target data. Overall, M-DWT-ANN model forecasts in different time steps are superior to ANN and DWT-ANN models.

### Results of RBF models and their hybrid with DWT and M-DWT

Similar to the other models used, the data sets were divided into training and testing data and used for the RBF, DWT-RBF, and M-DWT-RBF models in the GWL forecast. The same input structures and delay of the ANN model to the RBF model are introduced. The results show the high proficiency of the presented model for a 1-month ahead forecast, with relatively little error and high correlation, which illustrates the appropriate performance of RBF. For 2- and 3-months ahead forecasts, the general trend was maintained for 1-month ahead forecasts, with less accuracy. In Table 3, the R, RMSE, NRMSE, and NSE values of the RBF model are presented to forecast the time steps. The RBF model's R and NSE values in the verification stage for the 1-, 2-, and 3-months ahead are more than 0.79 and 0.46, respectively. Also, the RMSE and NRMSE values of the RBF model for these time steps are less than 1.1 and 15.7, respectively. Meanwhile, the forecasts for 6-months ahead are the least accurate, although the forecasts for this period are relatively good. Increasing the forecast intervals affected the accuracy of forecasts.

The maximum number of iteration epochs in the RBF model to the number of the training data (samples) is equal. The use of wavelet analysis, especially the several mother wavelets simultaneously and together, increases the input information to the model. These actions improve the performance of the RBF model in more Epochs, as Figure 11 clearly shows.

As shown in Table 3, the performance of the best combination of multiple simultaneous mother wavelets (i.e., M-DWT) is much better than that of the best single mother wavelet (DWT), especially in forecasting the 1-, 2-, and 3-months ahead. The R, RMSE, NRMSE, and NSE values of the DWT-RBF and M-DWT-RBF models are shown in Table 3 to forecast the time steps. Results in the verification stage, for the 1-, 2-, and 3-months ahead, the NSE and R values are close to 1. The RMSE and NRMSE values are less than 0.3 and 4.2 for the DWT-RBF model, respectively, and also for the M-DWT-RBF are less than 0.08 and 1.1 model, respectively. These results clearly illustrate the high ability of the hybrid model to forecast GWL fluctuations. The M-DWT-RBF model performs better than other models from the viewpoints of R, RMSE, NRMSE, and NSE. The 1-month ahead forecast of each model is shown in Figures 12 and 13, in the form of hydrograph and scatter plots, and these figures are representative that the 1-month forecast of M-DWT-RBF is better than other models. Meanwhile, it shows the observed GWL versus the forecasted GWL using the best models. As shown in Figures 12 and 13, preprocessed models with WT have better performance than individual models. Figure 12 represents the measured and forecasted GWL values, and the figure shows the excellent ability of the M-DWT-RBF technique in forecasting GWL 1-, 2-, 3-, and 6-months ahead. Overall, the results indicated that M-DWT-RBF models are more suitable than single RBF and DWT-RBF models for GWL forecasting, and the high proficiency of the proposed model for the multi-month-ahead forecast illustrates the appropriate performance of hybrid RBFs.

### Results of ANFIS models and their hybrid with DWT and M-DWT

Similarly, the same input structures and delay introduced to the ANN and RBF models were used for the ANFIS model and for modeling the aquifer's average GWL. In ANFIS modeling, two main matters need to be considered, including the ANFIS structure (e.g., number and type of MFs) and the model calibration approach (e.g., number of iterations, performance criteria). In the ANFIS calibration stage, the epoch number is a significant factor. The suitable iteration numbers can raise the model's performance in both the calibration and verification stage and prevent the model from over-training. Table 3 illustrates the ANFIS models' results with the most optimal structure and input combination, i.e., 12 delays, the most appropriate number and type of MFs (i.e., 2 MFs), the best number of iterations, and used for the different forecasting periods. The MFs, such as the Gaussian curve (i.e., gussmf), trapezoidal curve (i.e., trapmf), generalized bell (i.e., gbellmf), and triangular (i.e., trimf), were used in ANFIS modeling. The best performance of the ANFIS trained is with Gaussian MFs (i.e., gussmf), and two rules were chosen as the satisfactory architecture the results of which are presented in Table 3. Similar to ANN and RBF models, ANFIS models have been suitable for GWLs forecasts to 1-, 2-, and 3-months ahead, but for the 6-months ahead, the basin's performance criteria are not satisfactory.

The DWT-ANFIS and M-DWT-ANFIS hybrid models were based on the hybrid of the wavelets with the ANFIS model. The wavelet decomposition similar to the developed ANN and RBF models was introduced to minimize forecasting errors. The R, RMSE, NRMSE, and NSE values of the DWT-ANFIS and M-DWT-ANFIS models are shown in Table 3 to forecast the time steps. The NSE and R values for the 1-, 2-, and 3-months ahead in the verification stage are close to 1. The RMSE and NRMSE values are less than 0.25 and 3.7 for the DWT-ANFIS model, respectively, and also for the M-DWT-ANFIS model is less than 0.09 and 1.3, respectively. Similar to other models aforementioned, the DWT-ANFIS and M-DWT-ANFIS models provide a suitable forecast of GWL for 1-, 2-, and 3-months ahead. But, for the 6-months ahead forecast, the Performance criteria for the study area were not satisfactory. The best models (i.e., ANFIS, DWT-ANFIS, and M-DWT-ANFIS models) were used to forecast GWL, and the results in Figures 14 and 15, are shown. The figure illustrates that the observed and simulated GWLs for the M-DWT-ANFIS model are more conformable than the ANFIS model. The statistical measures offer visible changes for the ANFIS, DWT-ANFIS, and M-DWT-ANFIS models in the GWL forecast for the coming month. These models' RMSE will be reduced from 0.625 to 0.029 m and then to 0.008 m, respectively. Therefore, it can be concluded that the conjunction of the ANFIS model with wavelets, especially several simultaneous wavelets, can improve the performance of this model. Overall, the results showed that the M-DWT-ANFIS model's performance is dramatically better than the ANFIS and DWT-ANFIS models in GWL forecasting, which indicates the high proficiency of the proposed model for the forecasting with relatively fewer errors and higher correlation.

### Results of comparing the performance of AI models developed in forecasting GWL

This part compares the performance of AI models and their hybrids with WT in the study basin. This study aims to develop the best data-driven time-series forecasting models for GWL fluctuations. In the calibration (training) stage, the RMSE values in the best ANN, RBF, and ANFIS models for GWL are 0.406, 0.399, and 0.396, respectively. In the verification (testing) stage, the RMSE values in the best ANN, RBF, and ANFIS models for GWL are 0.605, 0.629, and 0.625, respectively. The RMSE values of the models are very close to each other. But the ANN model is smaller than other models in the verification stage, which means that the ability to simulate the ANN model is somewhat better than other models to learn the given data's behavior. However, the results show that R, RMSE, NRMSE, and NSE's values for forecasting the time steps of 2-, 3-, and 6-months ahead, the best ANFIS model is superior to the best ANN and RBF models (see Table 3). The choice of the best enforceable model to use must balance the advantages of statistical parameters based on the performance criteria in both the calibration and verification stages. In general, it was found that the ANFIS model is better than other models for forecasting the GWL of the studied basin. Figures 9, 12 and 14 show the observed GWL versus the forecasted GWL using three models, ANN, RBF, and ANFIS, in the calibration and verification stage. The results illustrated that due to noise in the GWL time series, the efficiency of the ANN, RBF, and ANFIS models is not favorable. Thus, in time forecasting, investigating, and confronting the available noise in GWL time series is a fundamental issue, and the de-noise or its decomposition can increase modeling performance. According to Table 2, the effect of decomposed inputs by WT on the ANN, RBF, and ANFIS models is significant. The optimal AI models were selected (i.e., ANN, RBF, and ANFIS models) to couple with the hybrid DWT-AI and M-DWT-AI models. The data were analyzed and broken down by different DWT filters at different levels. For example, Figure 7 illustrates approximations and details subseries of GWL time series of decomposed by db8 mother wavelet at level 2. Delayed GWL time series were analyzed using different mother wavelets at level two. In fact, in decomposition at any decomposition level, for example, at decomposition level of two, one approximation and two details are available for each time series. In this study, it is noteworthy that the approximation of previous decomposition levels (level of one) was used and improved the results. For the study area, as presented in Table 1, the combination of 1 to 12 delays in the GWL time series as input data resulted in the best performance of the model. Table 1 illustrates a combination of input data and structure versus RMSE and R for the best ANNs. GWLs 1 to 12 delays include the decomposition level of 2, so information on the monthly, seasonal, and yearly trends and periodicities is included in inputs for the modeling. Therefore, this combination has created the best input for GWL modeling. Table 3 shows the results of the best-proposed models including DWT-ANN, DWT-RBF, DWT-ANFIS, M-DWT-ANN, M-DWT-RBF, and M-DWT-ANFIS with the best input combinations (i.e., GWL input combinations 1 to 12 delay) and the best network structure and parameter tuning after trial-and-error for GWL forecasting of 1-, 2-, 3-, and 6-months ahead. The DWT-ANN, DWT-RBF, and DWT-ANFIS models provide significantly more accurate forecasts of the GWL than the ANN, RBF, and ANFIS models. For instance, for the GWL forecasting, the DWT-ANN model is better than the simple ANN model in the verification stage by increasing the R value from 0.91 to 0.99 and decreasing the RMSE value from 0.605 to 0.017. It also was determined that the DWT-ANN hybrid model forecasted GWLs more accurately than DWT-RBF and DWT-ANFIS. As presented in Table 3, the RMSE values in the verification stage for the DWT-ANN, DWT-RBF, and DWT-ANFIS models, 0.017 (m), 0.035 (m), and 0.029 (m), respectively, for 1-month ahead forecast the following GWL, is obtained. Also, the performance of the proposed models M-DWT-ANN, M-DWT-RBF, and M-DWT-ANFIS in GWL forecasting of simple models (i.e., ANN, RBF, and ANFIS) and single wavelet hybrid model (i.e., DWT-ANN, DWT-RBF, and DWT-ANFIS models) were better according to R, RMSE, NRMSE, and NSE performance criteria. It is evident from Figures 9, 12, and 14 that the employees of the multi-wavelet models increase the performance of wavelet-based models and reduce the uncertainty in forecasts. As shown in Table 2 for GWL modeling for the basin, the best single wavelet is db45. Also, the best multiple mother wavelets together include db, sym, bior, coif, and fk wavelets for each model, the filter length of the wavelets is different. The ANN model developed with mother wavelets including db8, sym4, coif5, bior5.5, and fk6 provides the best performance compared to other models for forecasting the GWL of multi-time-step ahead, based on the RMSE performance criterion. Figures 9, 12, and 14 display the comparison between the observed and forecasted GWLs using the best M-DWT-ANN, M-DWT-RBF, and M-DWT-ANFIS models for 1-, 2-, 3-, and 6-months ahead forecast. Figures 10, 13, and 15 are scatterplots comparing the observed and forecasted GWLs using the best M-DWT-ANN, M-DWT-RBF, and M-DWT-ANFIS models, the best ANN, RBF, and ANFIS models for 1-, 2-, 3-, and 6-months ahead forecasting during the calibration (training) and verification (testing) periods. It is evident that the best M-DWT-ANN, M-DWT-RBF, and M-DWT-ANFIS models have fewer scattered estimates and that the values are denser in the vicinity of the direct-line compared to the best ANN, RBF, and ANFIS models. The RMSE value in the verification stage for the best M-DWT-ANN, M-DWT-RBF and M-DWT-ANFIS models, for GWL (t + 1) are 0.0001 (m), 0.0068 (m), and 0.0077 (m), respectively, for GWL (*t* + 2) are 0.0006 (m), 0.035 (m), and 0.031 (m), respectively, for GWL (*t* + 3) are 0.0007 (m), 0.071 (m), and 0.074 (m), respectively, as well as for GWL (*t* + 6) are 0.315 (m), 0.499 (m), and 0.345 (m), respectively, were obtained to forecast. For example, to forecast GWL (*t* + 3), the NRMSE error in the models' verification stage is M-DWT-ANN, the M-DWT-RBF, and the M-DWT-ANFIS are 0.01, 1.05, and 1.24%, respectively. Therefore, the results indicate that the coupling of wavelets with M-DWT-ANN, M-DWT-RBF, and M-DWT-ANFIS models improves forecast compared to ANN, RBF, and ANFIS models as well as any single DWT-ANN, DWT-RBF, and DWT-ANFIS. Also, a general comparison between the best forecasting models presented in all times steps based on R, RMSE, NRMSE, and NSE criteria showed that the M-DWT-ANN model compared to the M-DWT-RBF and M-DWT-ANFIS models have higher R and NSE values and lower RMSE and NRMSE values, which means that this model has a higher ability for modeling. These cases are especially true in forecasting the 1-, 2-, and 3-months ahead. Overall, the results showed that the proposed M-DWT-ANN model has the best performance in multi-step-ahead GWL forecasting in the study area compared with other models (i.e., the lowest error and the highest correlation).

## CONCLUSION

This research proposes a technique based on M-DWT and hybrid with AI models to highly accurate GWL forecasting. The M-DWT technique is designed and presented with the aim of evolving and overcoming the weakness of the conventional DWT method in decomposition and mining various hidden features in hydrological and natural data. Applying this technique increases the predictive accuracy of AI models to help managers manage groundwater resources more effectively and sustainably. Hence, for better evaluation, the M-DWT-ANN, M-DWT-RBF, and M-DWT-ANFIS models were compared to single wavelets DWT-ANN, DWT-RBF, and DWT-ANFIS models, and also to simple and regular ANN, RBF, and ANFIS models. The extended models examined for GWL in 1-, 2-, 3-, and 6-months ahead in the Chamchamal-Bisotun plain, located in the Kermanshah Province, western Iran.

For this purpose, the input data first were entered into ANN, RBF, and ANFIS models without any preprocessing. The ANFIS model had a better performance than other models in forecasting the time steps ahead was based on R, RMSE, NRMSE, and NSE criteria. These results conform with the finding of Gong *et al.* (2016) and Mirzavand *et al.* (2015). The results illustrated that these models could not cope, especially for long-term forecasting, with the nonlinear and seasonal conduct of data. In the next step, the DWT to the GWL time-series data was applied, and then the preprocessed data were used as input to the ANN, RBF, and ANFIS models. The coupled models were developed based on the hybrid of the DWT and the ANN, RBF, and ANFIS models. Using DWT, each of the original time-series was decomposed into subsignal sets to increase information and understanding of networks. Then, from them in the forecasting as input to ANN, RBF, and ANFIS models were used. Using the DWT cause decomposed noisy data and increased network perception by extracting seasonal and periodic signals (data) from the GWL time series. According to the unique characteristics of each mother wavelet, using the analysis of several different mother wavelets to extract the input time series, the model can obtain more understanding and efficiency in simulating GWL fluctuations. The M-DWT technique has been used for data preprocessing and for the development of hybrid models called M-DWT-AI. The M-DWT consists of db, coif, sym, fk, and bior wavelets that to GWL data simultaneously and together were applied. The DWT-AI hybrid model is a hybrid of DWT and AI models (such as ANN, RBF, and ANFIS). Since each time series decomposed by DWTs contain part of the information and valuable properties of signal (GWL data) and a combination of several of them as input data preprocessing for the ANN, RBF, and ANFIS models, the performance of these models has significantly improved. These actions lead to improved training, increased accuracy, and reduced modeling error in these models' GWL forecasting. The developed models were able to cope well with various nonlinear features of the GWL process. This study showed that the best M-DWT-ANN model is significantly more accurate than the best ANN, RBF, and ANFIS models and their best hybrid models. These results conform with findings of studies by Adamovski & Chan (2011) and Ebrahimi & Rajaee (2017). Also, it concluded that the results of all models and modeling techniques are particularly satisfactory for forecast GWL for the 1-, 2-, and 3-months ahead, but results are not favorable for forecast GWL for 6 months. The forecast results for the Chamchamal-Bisotun basin show that the M-DWT-ANN method is an effective and valuable method for GWL forecasting by detecting hidden and important hydrological parameters. Highly accurate GWL forecasting models such as the M-DWT-ANN model developed in this research can be a valuable and efficient tool in the optimal management of water resources and sustainable groundwater extraction. Due to the advantages of the proposed method in forecasting GWL, using the M-DWT-AI technique in future studies on other aquifers in different geographical areas and other hydrological phenomena and events be suggested.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## REFERENCES

*Artificial Neural Networks in Hydrology*. Water Science and Technology Library, Vol. 36. Kluwer Academic Publishers