Data, data, data. It seems like that’s all you ever hear about these days. Lately, the news hasn’t been particularly flattering. Yet, the fact remains that the world is just at the beginning of the data revolution and it will play a larger role in our daily lives, particularly as the fields of artificial intelligence and machine learning mature. Making sense of the deluge of data and understanding how to interpret it will influence how we see and interact with our digital world.
Sarah Steele knows a thing or two about data being that she’s a… well… a data scientist at a travel startup called Fareportal. We threw her a few questions and she graciously discussed her experiences in the industry and working with data.
SCIENTIFIC INQUIRER: For those unfamiliar, what exactly is Data Science and what does it involve? What tools do you use?
SARAH STEELE: Simply put, data science is when you closely examine a large mass of data and use it to extract relevant insights. It’s basically one big puzzle and when you figure it out, your brain does this “click” thing. It’s tedious, but very satisfying when it comes together. Luckily, there are a ton of tools to help reach this achievement. Academia uses a lot of MATLAB and legacy programs, but I like that open source communities have developed some of the most widely used algorithms. The increase in accessibility of toolkits like Anaconda with consumer and enterprise grade installation packages makes things so much easier. The data science community has really grown to create a fostering environment with easy access to documentation and support (i.e., stackexchange vs irc).
SCINQ: Can you discuss your work at Fareportal? Where does data science and travel fares intersect?
SS: I began working part time at Fareportal in August 2016 and full time in September 2016. I am developing my leadership skills and learning how to coordinate a team to fulfill a goal and I am getting more involved in the planning and management of projects. I love agile. In addition, our team is using some great technologies in the field— Apache Spark, Amazon Web Services, etc. Distributed processing to crunch tons of data quickly and make it available for consumption for business users. These programs help us to be on top of the latest trends and apply them to our business.
SCINQ: It’s often suggested that men and women differ in their approaches to certain aspects of problem solving. Does this hold true in Data Science? Do you notice differences in the workplace?
SS: The Fareportal workplace is intensely multi-cultural so there tend to be a lot of different groups and identities. I think socially constructed identities certainly influence how people perceive what they are doing and how they are going about it as well as how they perceive others. However, my research in statistics and perception has shown that people very often see patterns that are not there, and that they often miss patterns that have a huge impact on outcomes. In general, my philosophy is that for most social categories, the average difference between any two members of the same “group” is often just as large as the difference between groups, and you should never assume you know someone’s capabilities or preferences just based on their cultural identity. I personally have a good time working with both men and women, as long as they are enthusiastic and engaged.
SCINQ: Presumably, other companies have access to the similar data – airfare, hotel rates, etc. – how does Fareportal’s approach to it differentiate it from its competitors?
SS: What I have seen here at Fareportal is an incredible transformation from application-minded or transaction-minded thinking to customer-minded thinking. In order to provide the best service to our customers and have them coming back for more, we are integrating data from many different silos and creating a unified picture of the business from their perspective. Building the data architecture to support real time decisions about our core business, and understanding what the people on the other end of the screen want, has required creativity, discipline, and process-building. I have had the privilege of working with a ton of very clever and indomitable people across the business to build out sophisticated views of our business around our customers.
SCINQ: What are the limits to which data can be mined? Is every answer ultimately in the data?
SS: Every data set is a filtered view of reality. In order to make a measurement, you have to specify what you are keeping, and what you are throwing away. This is a fundamental aspect of perception that our brains do a very good job of hiding from us. A good scientist is always skeptical of what data WASN’T selected and what dimensions WEREN’T measured. I like the parable of the elephant – many blind men put their hands on an elephant; one is certain that he is touching a column (leg), another a rope (tail), another a piece of cloth (ear), etc. There may be such a thing as “the data”, but all we ever have access to is a small subset of it, and there are always questions that cannot be answered. When it comes to data science and machine learning, the trick is usually in figuring out EXACTLY what your question is, and getting the right data for the right question.
SCINQ: Take us into the mind of a data scientist… Do you see everything in terms of data?
SS: Probably more than most people feel comfortable around, but certainly not everything. There is an art to data science. Sometimes my decisions are based on my best assessment of the best solution, sometimes it’s what people are most excited about. It is always important to keep in mind that we are trying to tell a story. Story and personality and relationships are extremely important in making the whole thing run smoothly, and these require listening and intuition and understanding people on a human level. For what it’s worth, these are things women are “supposed to” excel at that I am sometimes a bit clunky with. I tend to dive deep into a problem and realize things have shifted by the time I come out of my “trance” for air. There is a different kind of data processing going on here, in which you gather evidence and try to cultivate good will and reputation, but it’s not data that you can download.
SCINQ: How does data science influence our everyday lives?
SS: This is sort of hard to describe. Surprisingly, I think it’s a bit like the frog in the boiling water. Things have changed a lot because of new products and technologies emerging from data science, but at a consumer level some of it is hard to perceive because it is automating small things that we don’t really miss doing – like Google’s Assistant, SPAM blockers, etc. I used to have to choose which album I wanted to listen to; now Spotify picks it for me. In terms of gross changes to the industrial economy, i.e. self-driving cars, I don’t even really want to speculate, because I think I would almost certainly be wrong. Skynet or Star Trek? No, it’s probably going to work out to be something we can’t even imagine yet. I still want my flying car, though.
SCINQ: What is the relationship between quality and quantity of data? As the amount of data gathered increases, doesn’t necessarily follow that quality does as well?
SS: No. Especially with redundancy without a single source of truth. You can wind up with many copies of what is supposed to be the same thing, each that says something different, and it’s impossible to reconcile. In general, it’s easier to make a lot of very low-quality data than a lot or even a medium amount of very high-quality data. It doesn’t mean you can’t use it, you just need to have a relatively simple model and a very strong engine.
SCINQ: Data has been in the news a lot lately, particularly with the Facebook situation regarding privacy and how data is used. A few days ago, at the Credit Suisse Asian Investment Conference, a speaker predicted that China would ultimately win the AI race. His basic argument revolved around the fact that America still had privacy rules regarding data collection and use (ultimately a limiting factor) whereas China does not as much and can freely collect as much as it wants. Is he correct?
SS: I’ve used the internet in China, it was not that great. The providers may be allowed to collect as much information as they want, but the users don’t seem to be able to access much at all, and it’s certainly not reliable. I don’t know about Credit Suisse, but my impression of China’s tech culture is that they’re missing a lot of the consumer applications and “delight” that drive the kind of mass scaling technologies you get when you build, say, a Twitter. I am impressed with their state’s emphasis on education and the cultural values around that which I can observe. However, I have a lot of their data scientists on my team so… who’s winning?
The Scientific Inquirer needs your support. Please take a minute and visit our Patreon page where you can discover ways that you can make a difference. http://bit.ly/2jjiagi