← Back to blog

[Palantir Foundry] Titanic Survivor Prediction Project (2/2) — Model Training & Workshop Dashboard

Train a baseline model in Foundry Code Workspaces (JupyterLab), export predictions for Kaggle, and build a simple dashboard in Workshop using Ontology.

PalantirFoundryTitanicMLWorkshopKaggle

This post continues from Part 1 (preprocessing). Using titanic_cleaned_train, we go through: (1) model training in JupyterLab → (2) Kaggle submission → (3) visualization with a Workshop dashboard.

1. Create a Code Workspace (JupyterLab)

1) Go back to the Titanic project home and select Application.

Application
Application

2) Search for code → open Code Workspaces → click Create new.

Create code workspace
Create code workspace

3) Select JupyterLab → click Continue.

Select JupyterLab
Select JupyterLab

4) In Select location, choose the project Titanic → click Continue.

Select location
Select location

5) Review the summary and click Create.

Create
Create

6) After a short wait, you should see a page like this.

Workspace created
Workspace created

7) Open the Data panel → click Add Data and load the datasets you created.

Data tab
Data tab
Add Data
Add Data
Select datasets
Select datasets

8) Create a new Python environment for the notebook and install scikit-learn from the library.

Create python environment
Create python environment
Install scikit-learn
Install scikit-learn
Install scikit-learn
Install scikit-learn
Install done
Install done

2. Load data & train a baseline model (RandomForest)

The following is a baseline setup to quickly validate end-to-end training and prediction.

2.1 Load data

# 1. Load data
from foundry.transforms import Dataset
import pandas as pd
import numpy as np

titanic_cleaned_train = Dataset.get("titanic_cleaned_train").read_table(format="pandas")
test = Dataset.get("test").read_table(format="pandas")

print("✅ Data loading complete!")
print(f"Train data: {titanic_cleaned_train.shape}")
print(f"Test data: {test.shape}")
print("\nColumn list:")
print(titanic_cleaned_train.columns.tolist())
Load data
Load data

If you see warnings, you can usually ignore them as long as the data loads successfully.

2.2 Preprocess (build training features)

# 2. Data preprocessing and feature preparation

# Add Sex_Encoded to Test data
test["Sex_Encoded"] = test["Sex"].map({"male": 0, "female": 1})

# Missing value handling
test["Age"].fillna(titanic_cleaned_train["Age"].median(), inplace=True)
test["Fare"].fillna(test["Fare"].median(), inplace=True)

# Select features to use for learning
features = ["Pclass", "Sex_Encoded", "Age", "SibSp", "Parch", "Fare"]

# X, y separation
X_train = titanic_cleaned_train[features]
y_train = titanic_cleaned_train["Survived"]
X_test = test[features]

print("✅ Preprocessing completed!")
print(f"\nX_train shape: {X_train.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"X_test shape: {X_test.shape}")
print(f"\nCheck for missing values:")
print(f"Train: {X_train.isnull().sum().sum()}")
print(f"Test: {X_test.isnull().sum().sum()}")
Preprocessing
Preprocessing

2.3 Train a model (RandomForest) + cross validation

# 3. Random forest model training
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

rf_model = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    min_samples_split=5,
    min_samples_leaf=2,
    random_state=42,
)

print("🚀 Training the model...")
rf_model.fit(X_train, y_train)

cv_scores = cross_val_score(rf_model, X_train, y_train, cv=5, scoring="accuracy")

print("✅ Model training complete!")
print("\n📊 Cross-Validation Accuracy:")
print(f"   average: {cv_scores.mean():.4f}")
print(f"   standard deviation: {cv_scores.std():.4f}")
print(f"   Each Fold: {[f'{score:.4f}' for score in cv_scores]}")
Train model
Train model

2.4 Check feature importance

# 4. Check Feature Importance
feature_importance = pd.DataFrame(
    {"feature": features, "importance": rf_model.feature_importances_}
).sort_values("importance", ascending=False)

print("📊 Feature Importance:")
print(feature_importance.to_string(index=False))
Feature importance
Feature importance

3. Export predictions & submit to Kaggle

3.1 Run predictions

# 5. perform predictions
predictions = rf_model.predict(X_test)
prediction_proba = rf_model.predict_proba(X_test)

print("✅ prediction complete!")
print("\nSummary of prediction results:")
print(f"  - Total number of predictions: {len(predictions)}")
print(f"  - survival prediction: {sum(predictions == 1)}")
print(f"  - death prediction: {sum(predictions == 0)}")
print(f"  - survival rate: {sum(predictions == 1) / len(predictions) * 100:.2f}%")
Predictions
Predictions

3.2 Create the submission dataframe

# 6. Create a result dataframe
result_df = pd.DataFrame(
    {
        "PassengerId": test["PassengerId"],
        "Survived": predictions,
    }
)

print("📋 Sample prediction results (first 10):")
print(result_df.head(10))

print("\n📋 Sample prediction results (last 10):")
print(result_df.tail(10))
Result dataframe
Result dataframe

3.3 Write the result back to a Foundry dataset (Export)

# 7. Extract results
gender_submission = Dataset.get("gender_submission")
gender_submission.write_table(result_df)
Write dataset
Write dataset

3.4 Submit to Kaggle

1) Go back to the project home and click the exported gender_submission dataset.

Select gender_submission
Select gender_submission

2) All actionsDownload as CSV

Download as CSV
Download as CSV

3) On Kaggle, open Submit Predictions, upload the CSV, and submit.

Submit Predictions
Submit Predictions
Upload CSV
Upload CSV

4) You should see a score (e.g., 0.76315).

Score
Score

From here, you can improve the score by iterating on features, model choice, and hyperparameters.


4. Visualize results in Workshop (Ontology)

4.1 Create an Ontology (connect the result dataset)

1) In Application, search for ontology → open Ontology Manager.

Ontology Manager
Ontology Manager

2) Click NewObject type.

New object type
New object type

3) Choose Use existing datasourceSelect datasource → pick gender_submission.

Select datasource
Select datasource
Select gender_submission
Select gender_submission

4) Click Next → and Next again on Step 2.

Next
Next
Step2 Next
Step2 Next

5) Set Primary key and Title to Passenger Id → click Create.

Primary key
Primary key
Create ontology
Create ontology

6) Click Save.

Save
Save

4.2 Build a Workshop dashboard

1) In Application, search for WorkshopCreate new.

Create workshop
Create workshop

2) Click Save.

Save workshop
Save workshop

3) On the left, click Add widget → choose Object list.

Add widget - object list
Add widget - object list

4) To set the input data, click New object set variable.

New object set variable
New object set variable

5) Click Select starting object set, search for gender, and select it.

Select object set
Select object set

6) Click Add property → select Survived.

Add property
Add property
Object list result
Object list result

Now you can see whether each passenger survived or not in the list.

7) On the right, click Add widget → choose Chart: Pie.

Add widget - pie
Add widget - pie

8) Set the input to gender_submission and set GROUP BY to Survived to visualize the survival/death ratio.

Pie chart
Pie chart

9) You now have a simple dashboard. If needed, you can also deploy it so others can access it.

Dashboard
Dashboard

Wrap-up

This completes the Kaggle Titanic project workflow in Foundry—from preprocessing → training/prediction → Kaggle submission → dashboard visualization. In the next post, we can extend this by improving model performance (feature engineering, validation strategy, and model comparisons).