Hyper-v

Scaling XGBoost for thousands of features with Databricks



In this talk we will discuss a use case involving online advertising that allows marketers to target users based on demographic information and the corresponding ML modeling involved in this use case. After a brief discussion that covers the background of the problem, we will take a deeper dive into the modeling itself and how we were able to scale the training of an XGBoost model to handle thousands of features. We will cover some mistakes made along the way as well as important and useful information learned during this process. We will also cover some common bad practices to avoid and some noticeable differences between the Python and Scala implementations of XGBoost in Spark.

Q&A Links
Blog:

Speakers:

Phan Chuong is the Data engineer on the T-Mobile Marketing Solutions team and an Apache Spark expert. His focus is scaling machine learning models and deploying them into production, supporting marketing decisions using data insights. With a broad experience from the data warehouse, data analyst to software and product development, Phan brings a strong connection between different teams and helps build a complete Machine Learning product from end-to-end.

Eric Yatskowitz is a data scientist on the T-Mobile Marketing Solutions team. Eric has a background in mathematics and experience working on a diverse set of problems including computer vision, predictive maintenance, and marketing analytics.

source

 

To see the full content, share this page by clicking one of the buttons below

Related Articles

Leave a Reply