Make a call : +1 646 699 8676
In today’s time, with a combination of such massive computational powers and the amount of data that is available, smart data-based decision-driven systems can help humanity make better and optimal decisions across various sectors. One of such fields that can benefit tremendously is the medicine and healthcare sector.
ML-based systems can automate and manage all the administrative tasks leading to a savings of about a trillion US dollars. But more importantly, they can also be used to analyze, predict and prevent any potential life loss beforehand by studying the past data and predicting or forecasting for the future, the risk associated with patients. Predicting and identifying fatal or non-fatal diseases at earlier stages of a patient’s health and taking recovery actions or precautions in advance can save countless lives or hospital trips. Also, the medical industry generates a huge amount of data, which can be fed into ML systems to generate numerous other life-saving/enhancing use cases.
Some of the advantages of AI in healthcare:
For this implementation, we have used the ANAI platform to automate the data and feature engineering processes using the heart rate prediction data set. Using ANAI, any user can process the data efficiently to make it ready for the model building pipeline and further data use. This step aims to transform data into an easily interpreted form for the ML model, thus making it easy for the model to make further predictions. But for the data to be ML-ready, it needs to satisfy certain conditions and meet some requirements. Hence, the data undergoes a few pre-processing steps like data analysis, wrangling, transformation, encoding, etc.
ANAI’s Data Engineering pipeline automates the data pre-processing. If offers over 100 plus data ingestion methods to give flexibility while importing data, conducts a thorough pre-analysis of the data to understand its features and distribution, and then proceeds to data wrangling, where the missing and duplicate values are dealt with. For the Feature Engineering part, the platform applies automated feature engineering to detect features that affect the predictions the most and finally summarizes the data, giving it a health score.
ANAI performs automated data wrangling on the data to make it easier to process and interpret for further analysis and model building procedures. More specifically, ANAI cleans and transforms the data from one form to another to make it favorable for drawing valuable insights.
– Handling missing data: If certain segments of data are missing, then the model built later will generate predictions that are biased or skewed for a particular class, causing misleading decisions that can lead to inaccurate analysis, hence being taken care of by ANAI;
– Categorical encoding: ML algorithms work and understand better when the data is in a numerical format rather than in a categorical or non-numeric format, hence ANAI automatically converts all the categorical data into integer type, to make it readable to the model,
– Handling imbalanced data: Imbalanced data refers to the data sets that have an uneven distribution of the classes. For a model to run optimally and predict accurately, we need the data to be balanced and not skewed. Skewed data hinders the ability of predictions by introducing biases. To counter this skewness, ANAI brings in normalization and scaling, which helps reduce the skewness of the data.
The healthcare sector is one of those important sectors where we expect the data to be perfectly free of errors, anomalies, and biases. Just a slight skewness or errors in important data sets such as the heart rate prediction data can affect the predictions by a huge margin leading to improper or sometimes even neglect from a doctor’s side due to improper working of the model. To keep a patient’s health condition a top priority and avoid getting wrong predictions from the model, we need to handle the imbalance and other errors within the data. ANAI when deployed can help reduce the imbalance and regulate the skewness of the features automatically within minutes.
When the data is ingested into the platform it will go through a certain pre-analysis to generate a table displaying all the problems that the data has, such as the missing values, duplicacy, anomaly count, etc., and then the data based on its quality will get a score. Below, we can also get a summary of the features with basic statistic measurements.
For skewed data sets, like the one used here, ANAI performs normalization on the data wherein it manages the skewness by reducing outliers which can adversely affect the model’s performance.
After performing normalization on data, ANAI provides a detailed list of results generated from the transformation of the individual features present in the data, along with the mean skewness, and standard deviation for both before and after.
Feature Engineering is one of the most important aspects of any ML-related project. Hence, ANAI with its feature engineering pipeline selects the best possible features and transforms other features into a more optimal one. ANAI automates the feature engineering process, with its automated feature engineering methods deployed within the platform itself. Some of the methods that ANAI deploys are:
Recursive feature selection
In this method, the features that affect the target variable the most are selected. By selecting all the features, the method starts eliminating them one by one and retains those that give the best performance. The model’s “coef” or “feature importances” attributes prioritize features in RFE. The model is then recursively stripped of a small number of features per loop, removing any existing dependencies and collinearities.
Recursive Function Elimination reduces the number of features available, leading to an increase in the model efficiency. The method is repeated until no more features are left to delete or the specified number of features is reached.
This maximizing method is implemented to estimate the missing data of the latent variables using the data set’s available observable data and then utilizing that data to update the values of the parameters in the maximizing stage. In a statistical or mathematical model, it can be used to find the local maximum likelihood (MLE) or maximum a posteriori (MAP) parameters for latent variables. Expectation maximization probabilistically assigns each data point to a cluster. In this scenario, we calculate the likelihood that it came from the red and yellow clusters, respectively. Maximization is done based on the points in the cluster and the parameters for each cluster are then updated.
The above table shows the feature summary of the heart prediction data set. Here, we can identify the power that ANAI brings with it. The table shows a different number of features that were used for training a model, the R2 score that each such model generated, and the features that were selected for that case.
We can clearly see that the highest R2 score, 95.95 was received when only some 23 features were selected for training, and when the feature engineering pipeline selected all of the features (i.e. 31) the resulting R2 score was 95.77 and for just two features it was 56.07.
We can conclude that the 23 features which include VLF, LF, HF, HF_PCT, etc. are the ones that give the best score and they are the features that give the most information to the model.
Another benefit that features engineering brings is the reduction in the computational time while training because of the lower number of features that the model now has to train on. As the number of features increases the dimensional space the model has to work through increases exponentially leading to a problem called the curse of dimensionality. Hence, reducing the feature space saves a lot of time and energy during training.
In the United States, health care expenditures account for approximately US $3.5 trillion out of US $19.4 trillion (18%) of the overall gross domestic product (GDP) and for Germany, it is approximately US $0.4 trillion out of US $3.7 trillion (11.5%) of the overall GDP making economic impact assessment of increasing importance.
AI is capable of making diagnoses just as accurately as world-leading experts in the medical field. For example, there is one such AI-driven algorithm that can detect metastatic breast cancer with 99 percent accuracy. Also, Researchers at the University of Stanford created a machine learning algorithm that is capable of predicting death with a shocking 90 percent accuracy. This will surely affect the costs and reduce them drastically.
Using results and inferences obtained from ANAI will help doctors be proactive instead of reacting when a crisis occurs. The goal is to avoid or avert the catastrophe rather than to mitigate the harm once it occurs. ANAI’s data-driven system is effective in providing patients with high-quality care. Predictive models in healthcare can help analyze a patient’s data and deliver accurate treatment, whether it’s cutting waiting times or lowering the frequency of re-admissions.
Other benefits of using ANAI’s AutoML pipeline are using predictions made in improving infrastructure, resources planning, business operations, and management because minute errors can lead to fatal situations. Everything to be in sync and streamlined to perfection, hectic situations need to be prepared for and addressed beforehand which is possible using outputs generated from disease prediction to provide patients with faster treatments. Healthcare centers can have a stress-free work environment by automating routine operations, allowing staff to focus on providing courteous and efficient patient service.
Results obtained also help answer questions like –
ANAI also offers access to scaling ML models on platforms that are wholly automated with ease of use and induced flexibility/customization opportunities. The only input needed is data and the process of results generation will be completely automated with little to no human involvement.
So this case study went through how ANAI can perform automated data wrangling, and feature engineering to come up with the best features and make accurate predictions. One such sector where the need for optimal features is great is the healthcare sector. We learned the need for performing and selecting features here and how ANAI helps simplify the entire process.
Using the heart rate prediction data set, we demonstrated the working of the ANAI platform and how it can be used for your purpose. ANAI can also be used in handling imbalanced data and during the feature selection/extraction process, which allows to build models with accurate predictions and help improve patients’ conditions and plan proper treatment to ensure their safety. ANAI’s easy-to-use and end-to-end data, feature, and ML pipeline will allow entities in healthcare to bring patients’ health and safety-related priorities to the top again.
To implement such solutions or to get a personalized solution for your niche use case, contact us at firstname.lastname@example.org or visit www.anai.io/contact/