SafeAtSchool  Our special offering for Schools and Universities to prevent gunrelated casualties.

%pip install anaiopensource
import anai
from anai.preprocessing import Preprocessor
df = anai.load(
df_filepath='/Users/arshanwar/Projects/AutoML/open_source/ANAI/examples/healthcaredatasetstrokedata.csv')
Loading Data [*] Data Loaded Successfully [ ✓ ]
df.head()
id  gender  age  hypertension  heart_disease  ever_married  work_type  Residence_type  avg_glucose_level  bmi  smoking_status  stroke  

0  9046  Male  67.0  0  1  Yes  Private  Urban  228.69  36.6  formerly smoked  1 
1  51676  Female  61.0  0  0  Yes  Selfemployed  Rural  202.21  NaN  never smoked  1 
2  31112  Male  80.0  0  1  Yes  Private  Rural  105.92  32.5  never smoked  1 
3  60182  Female  49.0  0  0  Yes  Private  Urban  171.23  34.4  smokes  1 
4  1665  Female  79.0  1  0  Yes  Selfemployed  Rural  174.12  24.0  never smoked  1 
prep = Preprocessor(dataset = df, target = 'stroke')
summary = prep.summary()
summary.head(10)
Stats  

No. of Cells  61320 
No. of Variables  12 
No. of Records  5110 
Missing Cells  0.3 % 
Missing Cells Count  201 
Duplicacy  0.00 % 
Duplicate Cell Count  0 
Anomaly Count  256 
column_summary = prep.column_summary()
column_summary.head(24)
id  gender  age  hypertension  heart_disease  ever_married  work_type  Residence_type  avg_glucose_level  bmi  smoking_status  stroke  

Type Error  ID column is not allowed  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN 
hide  True  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN 
Type  NaN  Categorical  Numeric  Numeric  Numeric  Categorical  Categorical  Categorical  Numeric  Numeric  Categorical  Numeric 
Missing Value %  NaN  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  3.9334637964774952  0.0  0.0 
Mean  NaN  NA as column dtype is Categorical  43.23  0.10  0.05  NA as column dtype is Categorical  NA as column dtype is Categorical  NA as column dtype is Categorical  106.15  28.89  NA as column dtype is Categorical  0.05 
Mode  NaN  NA as column dtype is Categorical  78.00  0.00  0.00  NA as column dtype is Categorical  NA as column dtype is Categorical  NA as column dtype is Categorical  93.88  28.70  NA as column dtype is Categorical  0.00 
Maximum value  NaN  NA as column dtype is Categorical  82.00  1.00  1.00  NA as column dtype is Categorical  NA as column dtype is Categorical  NA as column dtype is Categorical  271.74  97.60  NA as column dtype is Categorical  1.00 
Median  NaN  NA as column dtype is Categorical  45.00  0.00  0.00  NA as column dtype is Categorical  NA as column dtype is Categorical  NA as column dtype is Categorical  91.88  28.10  NA as column dtype is Categorical  0.00 
Minimum value  NaN  NA as column dtype is Categorical  0.08  0.00  0.00  NA as column dtype is Categorical  NA as column dtype is Categorical  NA as column dtype is Categorical  55.12  10.30  NA as column dtype is Categorical  0.00 
Standard Deviation  NaN  NA as column dtype is Categorical  22.61  0.30  0.23  NA as column dtype is Categorical  NA as column dtype is Categorical  NA as column dtype is Categorical  45.28  7.85  NA as column dtype is Categorical  0.22 
99% Quartile  NaN  NA as column dtype is Categorical  82.00  1.00  1.00  NA as column dtype is Categorical  NA as column dtype is Categorical  NA as column dtype is Categorical  240.71  53.40  NA as column dtype is Categorical  1.00 
90% Quartile  NaN  NA as column dtype is Categorical  75.00  0.00  0.00  NA as column dtype is Categorical  NA as column dtype is Categorical  NA as column dtype is Categorical  192.18  38.90  NA as column dtype is Categorical  0.00 
66% Quartile  NaN  NA as column dtype is Categorical  55.00  0.00  0.00  NA as column dtype is Categorical  NA as column dtype is Categorical  NA as column dtype is Categorical  104.08  31.00  NA as column dtype is Categorical  0.00 
33% Quartile  NaN  NA as column dtype is Categorical  32.00  0.00  0.00  NA as column dtype is Categorical  NA as column dtype is Categorical  NA as column dtype is Categorical  81.78  25.10  NA as column dtype is Categorical  0.00 
10% Quartile  NaN  NA as column dtype is Categorical  11.00  0.00  0.00  NA as column dtype is Categorical  NA as column dtype is Categorical  NA as column dtype is Categorical  65.79  19.70  NA as column dtype is Categorical  0.00 
1% Quartile  NaN  NA as column dtype is Categorical  1.08  0.00  0.00  NA as column dtype is Categorical  NA as column dtype is Categorical  NA as column dtype is Categorical  56.33  15.10  NA as column dtype is Categorical  0.00 
Variance  NaN  NA as column dtype is Categorical  511.33  0.09  0.05  NA as column dtype is Categorical  NA as column dtype is Categorical  NA as column dtype is Categorical  2050.60  61.69  NA as column dtype is Categorical  0.05 
Monotonic  NaN  NA as column dtype is Categorical  0.00  0.00  0.00  NA as column dtype is Categorical  NA as column dtype is Categorical  NA as column dtype is Categorical  0.00  0.00  NA as column dtype is Categorical  0.00 
Mean Absolute Deviation  NaN  NA as column dtype is Categorical  19.12  0.18  0.10  NA as column dtype is Categorical  NA as column dtype is Categorical  NA as column dtype is Categorical  33.06  5.98  NA as column dtype is Categorical  0.09 
No. of Unique Values  NaN  3  104  2  2  2  5  2  3979  419  4  2 
No. of Negative Values  NaN  NA as column dtype is Categorical  0  0  0  NA as column dtype is Categorical  NA as column dtype is Categorical  NA as column dtype is Categorical  0  0  NA as column dtype is Categorical  0 
Percentage Infinite Values  NaN  NA as column dtype is Categorical  0.0  0.0  0.0  NA as column dtype is Categorical  NA as column dtype is Categorical  NA as column dtype is Categorical  0.0  0.0  NA as column dtype is Categorical  0.0 
Skewness  NaN  NA as column dtype is Categorical  0.14  2.72  3.95  NA as column dtype is Categorical  NA as column dtype is Categorical  NA as column dtype is Categorical  1.57  1.06  NA as column dtype is Categorical  4.19 
Shapiro_W  NaN  NA as column dtype is Categorical  0.97  0.34  0.24  NA as column dtype is Categorical  NA as column dtype is Categorical  NA as column dtype is Categorical  0.81  nan  NA as column dtype is Categorical  0.22 
df1 = prep.impute(method = 'mean')
df1.isna().sum()
id 0 gender 0 age 0 hypertension 0 heart_disease 0 ever_married 0 work_type 0 Residence_type 0 avg_glucose_level 0 bmi 0 smoking_status 0 stroke 0 dtype: int64
features, labels = prep.encode(split = True)
features.head(4)
id  gender  age  hypertension  heart_disease  ever_married  work_type  Residence_type  avg_glucose_level  bmi  smoking_status  

0  9046  0.048728  67.0  0  1  0.048728  0.048728  0.048728  228.69  36.6  0.048728 
1  51676  0.048728  61.0  0  0  0.524364  0.048728  0.048728  202.21  NaN  0.048728 
2  31112  0.524364  80.0  0  1  0.682909  0.524364  0.524364  105.92  32.5  0.524364 
3  60182  0.524364  49.0  0  0  0.762182  0.682909  0.524364  171.23  34.4  0.048728 
X_train, X_val, y_train, y_val, scaler = prep.prepare(features, labels, test_size = 0.2, random_state = 42, smote = False, k_neighbors = 3)
X_train.shape, X_val.shape, y_val.shape, y_train.shape
((4088, 11), (1022, 11), (1022,), (4088,))
ai = anai.run(filepath='/Users/arshanwar/Projects/AutoML/open_source/ANAI/examples/healthcaredatasetstrokedata.csv',
target='stroke', predictor=['rfc', 'cat', 'xgb', 'lgbm', 'ext'])
ANAITaskWarning: Task is getting detected automatically. To suppress this behaviour, set suppress_task_detection=True and specify task with task argument Task: Classification ░█████╗░███╗░░██╗░█████╗░██╗ ██╔══██╗████╗░██║██╔══██╗██║ ███████║██╔██╗██║███████║██║ ██╔══██║██║╚████║██╔══██║██║ ██║░░██║██║░╚███║██║░░██║██║ ╚═╝░░╚═╝╚═╝░░╚══╝╚═╝░░╚═╝╚═╝ Started ANAI [ ✓ ] Preprocessing Started [*] Imputing Missing Values by mean [*] Imputing Done [ ✓ ] Preprocessing Done [ ✓ ] Training ANAI [*] Ensembling on top 5 models Training Done [ ✓ ] Results Below
Name  Accuracy  Cross Validated Accuracy  

0  Random Forest Classifier  99.412916  99.363824 
1  Stacking Ensembler  99.510000  99.340000 
2  Extra Trees Classifier  99.412916  99.314864 
3  Max Voting Ensembler  99.410000  99.310000 
4  CatBoost Classifier  99.412916  99.265904 
5  XGBoost Classifier  99.412916  99.241574 
6  LightGBM Classifier  99.412916  99.143715 
Completed ANAI Run [ ✓ ] Saved Best Model to anai_info/best/classifier/models/Random_Forest_Classifier_1655501647.pkl and its scaler to anai_info/best/classifier/scalers/Random_Forest_Classifier_Scaler_1655501647.pkl Time Elapsed : 144.39 seconds
ai.explain('perm')
Explaining Best ANAI model [*]
Explaining ANAI Done [ ✓ ]
ai.result()
Name  Accuracy  Cross Validated Accuracy  Model  

0  Random Forest Classifier  99.412916  99.363824  (DecisionTreeClassifier(max_features='auto', r... 
1  Stacking Ensembler  99.510000  99.340000  StackingClassifier(cv=10,\n ... 
2  Extra Trees Classifier  99.412916  99.314864  (ExtraTreeClassifier(random_state=840703915), ... 
3  Max Voting Ensembler  99.410000  99.310000  VotingClassifier(estimators=[('Random Forest C... 
4  CatBoost Classifier  99.412916  99.265904  <catboost.core.CatBoostClassifier object at 0x... 
5  XGBoost Classifier  99.412916  99.241574  XGBClassifier(base_score=0.5, booster='gbtree'... 
6  LightGBM Classifier  99.412916  99.143715  LGBMClassifier() 