Problems that arise in the well log activity result inaccurate recording of the values of the physical parameters of the formation. This will also have an impact on the interpretation of the logging results carried out. Bad holes, equipment damage, and economic reasons are some of the causes. Machine learning methods or often called ML is a science that studies a set of data or patterns to later be used as a model to predict values or make decisions. Machine learning applications in well logging activities have been carried out by many researchers. One of them is the creation of synthetic well logging data. The application of this method can overcome the lack of data due to damaged formations or equipment defects.
In this case study, we will discuss the application of various machine learning methods to a well logging dataset. From the available well log data parameters, two different cases will be tested. The first case is the prediction of bulk density values from 6 predictors, namely CGR, GR, NPHI, RS, RD, and RMSF. The second case is the prediction of the Poisson's ratio value from 5 predictors, namely CGR, GR, NPHI, DTDRT, and DTCRT. The dataset used contains missing values (as known as NaN values), so that the missing data will be filtered. In each case will be treated 2 different treatments. The first treatment is filling in the missing data using the k-nearest neighbor (k-NN) imputation method. The second treatment is the elimination of missing data. The results of these differences in treatment will be compared, to then choose which treatment will give the best results.
From the activity of creating synthetic well log data, 20 data logs were obtained which were combined from 5 machine learning methods, namely linear regression, decision stress, random forest, gradient boosting, and artificial neural network (ANN). For the first case, it is found that the best model is the random forest model with missing data elimination treatment. Meanwhile, in the second case, the best model is the gradient boosting model with the elimination of missing data. From the trend of 20 log data, it was found that the most suitable treatment and giving the best results was the elimination of missing data. This is indicated by the small root mean squared error (RMSE). In addition, the sensitivity of predictor parameters was carried out to find out which variables had the strongest influence on the results. For the first case, it is found that the most sensitive parameter is neutron porosity (NPHI) and for the second case the most sensitive parameter is Delta-T Shear (DTDRT) or the time interval traveled by the secondary wave.