What is Data Science?

A Conversation with Ankur Teredesai, Data Scientist, nPario

These days, the term “bid data” is all the rage and over a dozen data management platforms are competing for the right to manage audience segmentation, targeting, analytics, and lookalike modeling for advertisers and publishers. I recently sat down with noted data scientist Ankur Teredesai to help understand data science, and how ad technology companies are using data science principles to help publishers and marketers understand audiences better. Ankur is the head of data science for nPario, a WPP portfolio company focused on data management, and he is also a professor at the University of Washington.

You hear lots of digital advertising people talk about “data science.” Does their perception differ from the broader, academic understanding of what data science is? Is the notion of utilizing data science in digital marketing applications new?

Ankur Teredesai (AT): It’s very interesting to see that the digital advertising folks have started deep conversations with data scientists. Data science is a very interesting space these days where data mining and database management technology is now helping variety of disciplines in establishing a scientific approach to decision making. Data science dealing with the problems of finding patterns in large amounts of data is not a new concept for digital marketing. What is new is the advent of technologies that now support finding useful patterns in large variety and velocity of data in addition to volume; thereby advancing the state of the art in marketing analytics.

Do ad technology companies really rely on data science? What does being a data-driven organization really mean?

AT: No ad-tech company can afford to NOT rely on data science in some shape or form. The power of predictive modeling is quickly differentiating the players who are making quick inroads by using the low-hanging fruits of data science for these domains from the ones that are treating data science as a passing buzzword. My advice to all ad-tech companies is to get at least one data scientist in their ranks; even if they don’t like the term data science for some or the other reason. Machine Learning, data mining and big data analytics are all equally acceptable today.

Describe the concept of data modeling for the non-academic user. What kind of models are being built for digital marketing applications?

AT: A variety of problems in digital marketing are being addressed using predictive modeling. Some examples of the work we are doing at nPario are (a) look-alike modeling that helps find targetable audiences to “rightsize” or expand a particular segment, (b) recommend cost-aware segments that are similar to desired audience segments for targeting, (c) provide comparative analytics for exploring the unique properties of a given segment, (d) enabling real-time audience classification to reduce the time to target in an efficient and effective manner.

Is the concept of lookalike modeling legitimate? How does this work? Is LAM a scalable targeting practice?

AT: Given customer behavior data, the lookalike model estimates and exploits the variations and similarities in behavior across various segments. Once the model figures out which similarities or differences are robust enough in the audience to be useful for predicting future behavior, it exploits these attributes in the data to expand a particular segment’s size by including those customers that are similar to the base segment but were not included because they did not meet the segment definition criteria. This allows fairly restrictive criteria to be relaxed using data science methods such as association mining and regression analysis to expand the segment size accurately, confidently, and in a scalable manner.

The entire digital advertising ecosystem is driven by data. What are the most valuable types of data for targeting? How do you see the future for the ecosystem? Will those with the most data win?

AT: This is central to success of the entire digital advertising world and the benefit of end consumers in finding the right and useful advertising. If we have to understand the user and focus on making digital advertising useful without making it adversarial, we have to focus carefully on the types and granularity of the data being collected. At both nPario and the Institute of Technology, University of Washington Tacoma where I hold an academic appointment, we stress the need for developing an ecosystem of data collection, management and mining that is customer centric with highest regards for security through robust multitenancy, cryptography and privacy aware practices.  The entire technology stack at nPario is for example, data agnostic. We decided very early on in the company to not be supplies of data but to be data pool neutral to allow our clients to bring their own first, second and third party data. Our platforms help clients derive value from the variety of big datasets while at the same time ensuring that customer and end-user privacy is preserved and not compromised through our actions in any manner whatsoever. To address your question if those with most data win I would like to quote that : Everybody has some data and some just have lots of data. It is the ones that have the right tools at the right time that will monetize their data the best.

This interview, among many others, appears in EConsultancy’s recently published Best Practices in Data Management by Chris O’Hara. Chris is an ad technology executive, the author of Best Practices in Digital Display Media, a frequent contributor to a number of trade publications, and a blogger.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s