Facilitating customer service monitoring by automatically classifying reasons for calling

@ Orange


After the teams showcased a first ML project, we were entrusted with two projects, including this one we carried out for the Orange's B2B customer service department.


After a call to the customer service occurs, phone counselors have to check the reason for calling in a list and summarize what happened in a free text form.

Customer service department realized then they'll barely use the top 10 elements of the list, because the list was too long and complex, and counselors were incentivized to work fast. However, they usually filled the free text form well.

The goal of the project was to use this free text field to automatically identify the reason for calling, and thus save time for phone counselor and allow to better track the resons for calling and how they evolve.


A Data Scientist already had conducted a MVP with Python and we were asked to industialize it. As our platform used Spark/Scala that didn't have the same algorithms implemented, we'd have to fully rebuild the entire project.

It ended being as much a Data Science project as a Machine Learning Engineering project.

The project was built as a NLP classification problem.

The customer service department had build a list of 5000 labeled comments. That dataset was very time consuming to build and what was provided was not enough to allow sufficiet results.

We decided to build a kind of semi-supervised algorithm where predictions could be used to speed up the labeling process.


Before using the semi-supervised loop, we reached a 70% accuracy on the top 22 labels (75% of the reasons for calling).

The most promising improvements were growing the labeled dataset with the semi-supervised loop with or witout human check, or trying new algorithms.

I accepted a new opportunity before knowing how accurate the algorithms could be after thoses improvements. 

Tools & methods :

Hadoop, Hive, Scala, Spark, Spark ML, Spark MLlib, Semi-supervised learning, NLP, Naive Bayes.

Interested in cooperation or would like to discuss anything ?