Scaling XGBoost for thousands of features with Databricks

0 1 minute read

In this talk we will discuss a use case involving online advertising that allows marketers to target users based on demographic information and the corresponding ML modeling involved in this use case. After a brief discussion that covers the background of the problem, we will take a deeper dive into the modeling itself and how we were able to scale the training of an XGBoost model to handle thousands of features. We will cover some mistakes made along the way as well as important and useful information learned during this process. We will also cover some common bad practices to avoid and some noticeable differences between the Python and Scala implementations of XGBoost in Spark.

Q&A Links
Blog:

Speakers:

Phan Chuong is the Data engineer on the T-Mobile Marketing Solutions team and an Apache Spark expert. His focus is scaling machine learning models and deploying them into production, supporting marketing decisions using data insights. With a broad experience from the data warehouse, data analyst to software and product development, Phan brings a strong connection between different teams and helps build a complete Machine Learning product from end-to-end.

Eric Yatskowitz is a data scientist on the T-Mobile Marketing Solutions team. Eric has a background in mathematics and experience working on a diverse set of problems including computer vision, predictive maintenance, and marketing analytics.

source

To see the full content, share this page by clicking one of the buttons below

0 1 minute read

To see the full content, share this page by clicking one of the buttons below

Related Articles

Install Redhat on Hyper-V (2024)

2023 aespa 1st Concert ‘SYNK : HYPER LINE’ Teaser #aespa

$1200 Winspace Hyper Wheelset

LAST MINUTE ALERT EARTHQUAKE 6.1 IN MEXICO EARTHQUAKE IN MEXICO SONO

Leave a ReplyCancel reply