A Candid Conversation about Data for Public Good
Talking Data and International Women’s Day with Two Female Data Pros
With International Women’s Day around the corner, we thought it the right moment to convene two women who are thoughtful, civic-minded, globally-focused data professionals for a public conversation about their work. Sarah Williams is the Associate Professor of Technology and Urban Planning at MIT, Director of the Civic Data Design Lab, and author of Data Action: Using Data for Public Good, and Tatsiana Barysavets is a Senior Manager, Data Analytics Consulting, at EPAM. The resulting conversation is not just good for the public, but good for you, the reader, as well. Enjoy.
Sarah: You define data for the public good as “practices that seek to do no harm, respond to the needs of those on the margins of society, expose unjust practices, and ultimately help educate us about our world so we can make better decisions.” Can you talk about a favorite project that does this?
Sarah Williams (SW)
A good example might be the Digital Matatu Project I’ve done in Nairobi, Kenya. One reason it stands out to me is because of the multiple ways in which the data was able to work for the public good. We collected data about the informal transit system in Nairobi. These are small buses or even minivans. They are what most Nairobians depend on to get around. For a long time, there wasn’t any data available about the routes of these buses. Imagine not having a bus schedule for the bus, or subway schedule for your subway! And so we created the data about where these buses went using an app that people carried on their cell phones.
We opened that data up for anyone to use but also created a map. The visualization and communication of data here were important. This is because not all of us can be data scientists, but understanding the insights of data through visualization brings the power of data to everyone. In this project, visualizing the map of the routes not only provided a guide to citizens but it showed NGOs and outside funders that matatus are an organized system. Previously, many NGOs didn't realize that there's a matatu organization, that they set routes and stops, and that, in fact, it is, in a way, a planned system. The maps helped them realize if they're going to improve the transit system, it’s important for them to engage with the matatu association, whether it's a privatized organization or not.
The visualization helped many of those people who want to impact transportation in Nairobi. It gave them a path to do that. The visualization also became a popular icon for the city, which was great. Much in the way New Yorkers “wear” their subway maps, we would see the map of Nairobi around.
We also opened the data up for anyone to use. We held a hackathon so that the local tech community, which is super vibrant in Nairobi, had access to it. Now there are 10 different apps that use the data as a base. There's a Waze-like app so that you can figure out which matatu to take based on traffic that day. This became the first informal transit system in Google Maps, so now you can actually navigate yourself.
Part of the impact was in the way that we designed the data collection exercise. We included all of the partners, the government, the local tech community, local advocates, and NGOs. A lot of them along the way, as we were developing this data, thought: “What is this? Why is this important?” And then once they saw the map and the data, they saw the impact it could have. When the government, which was largely uninterested the whole time, saw the release of the map, they made it the official map of the city. They were kind of saying, “We made the map,” and I think that's exactly the result we wanted.
Can you two talk about ethics and its relationship to data collection?
Tatisana Barysavets (TB)
In my opinion, data should be treated as a corporate asset. I am personally very inspired when I see how organizations leverage data to build synergies across business units, optimize processes, and make better and faster business decisions while collaborating on the same data. However, proper data governance should be in place to avoid unauthorized use of data and exposure of sensitive information. In any project which involves data collection, you must answer these simple questions: “What is the nature of the data that has been collected and processed? Who is going to use this data, and how? Are there any country- and/or organization-specific data standards and policies to be followed?” These answers will drive the creation of proper data access policies and data security and protection standards, and ensure the right design of the data platform. “Data privacy by design” is one of the key principles we always apply on any data project.
I say that data are people. Whether it's us logging into Facebook, or our cell phone records, or even tracking their rounds of the matatu, they represent our movement and represent us. So think of them as people in themselves. When I'm working on a data project, I think: “Is there potential to expose a person to harm?” And then think about ways in which my work will impact other people, people not in the data set. Or: “Who's missing from that data set?” and how exposing them could marginalize them. Communicating with the people who the data represents is also important. We must ask: “Does our analysis ring true?”
Sarah, you write about “ground truthing” data analytics. Have you ever had a situation in which you collected some data, did the ground truthing, and learned that you were completely wrong and you had to start all over again?
I geocoded all the images in the Getty Image database that deal with arts and culture, thinking it would give me an idea of hot spots in New York City and LA for arts and culture and creation. This is a project I worked on with my colleague, Elizabeth Currid-Halkett, who talks a lot about how arts and culture help spark innovation in cities. What we found is what my analysis of that data I identified was the way that the press or marketers use space in New York City to market arts. So what was coming up as a hotspot was Rockefeller Center or the Meatpacking District—not places we associate with art in New York City, but rather places that we associate with the iconic nature of New York. I was getting a sense of how the media sells arts and culture in New York—by associating it with parts of the city that are iconic to those outside the city. I should have known this would have been the result because I was using a media data set to analyze it. It was still interesting because I saw how media markets work and use the city to sell their products, but our question became much different. [Laughter.] But always investigating your biases and your data is important.
On the subject of biases: Have either of you have faced particular challenges in a male-dominated field like data? And if not, are there any particular benefits to being a woman in this world?
In my department, we have more women than men. I believe you can work and be successful in any environment if you love what you do, are able to explain why certain things are important for you, and are always honest. Whether you work with a man or woman, you are the human. Personally, I didn't experience any challenges and I'm happy to work with anyone and make a contribution.
When I started working with data science, I had great mentors that helped me envision my career. Sometimes, I do see in my students a feeling of, “Data science isn’t for women.” I am not sure where that message comes from. There is a barrier that I am sure came from sexist comments they may have been internalized. And I realized what they need is to see other women doing data science to envision their future in it. Seeing some role models really helps increase the number of women who work in this area and I am excited that often now many of the TAs for my classes are women helping to shape the mentorship. If you look at the statistics, there are definitely fewer women than men in data science, but our numbers are increasing all the time.
I feel particularly lucky to never really have felt big barriers in entering the field. I hope I create an environment for my students to feel that way as well.
Sarah, you've written that data sharing is essential to “building strong deliberative communities,” adding “but most people do not have the skills to parse the data whether big or small.” What are the biggest challenges in communicating and sharing data?
Data sharing is probably the most fun part of my work. I’m really a designer at heart. I mean, I'm a data scientist but I love to design.
As designers, we always struggle to figure out not just, “What is the best way to communicate?” but also: “How can we make sure that we do it in a way that doesn't expose anybody?” Every time we produce a map, I think, “Could this cause anybody harm?”
There was the Million Dollar Blocks project I worked on with Laura Kurgan. Laura and I exposed those areas where the government spent the most to incarcerate people. We showed that often these areas do not receive funding to address the systemic reasons for high incarceration rates. When we made the maps, we asked: “Is there potential to harm the people we’re talking about?” The idea was to show that our response to poverty is to put people in prison… but then by exposing that, would we be further marginalizing the community? Ultimately, we believed that our maps helped tell an important story of the needs of the neighborhoods.
I think that's the bigger struggle: “How much are we being provocative, as opposed to how much could we potentially cause things to happen with that data?” One of the most important things to think about when you're struggling with that question is, “What is the narrative around the data visualization that you create?” In the case of the Million Dollar Blocks, we presented it as showing this inequity, and so the maps are meant to uplift the community.
Tatsiana, you speak four languages and you've held a number of different data-related jobs. Can you talk about how your diverse background and knowledge has helped you here, in terms of communication?
Working in this industry for more than 10 years, I've been lucky to collaborate with people from, I guess, more than 40 countries. I’ve learned the culture of different people and different thinking processes. This has helped me a lot in communicating the results of the data. It’s important to show key messages in a clear and simplified way and deliver them to your audience in the right fashion.
It's always important to ask, “Who is my audience? Who will consume this data? What kind of insights do we want to deliver with the data? And what is the most efficient way to do it?” As for the last question: It can be a simple visualization. It can be a text message. It can be anything. But you always need to know your audience and the type of the insights they want to receive.
Being able to work in different contexts, with different countries, has made my experience richer and helped to increase and optimize the way I communicate information in any situation.
Sarah: You’ve talked about the desire that some companies have for getting involved in data action. But one of the things we need to do is find the proper way to incentivize them to get involved. Any thoughts on the first steps we might take toward bringing companies into the world of data action?
The biggest issue is licensing—and ensuring that people who use this data are protecting people's rights. The first step is creating what are called “data intermediaries.” These are people/organizations who protect and redistribute aggregated data in a way that retains privacy. A great example of a data intermediary is SharedStreets, which takes a lot of the micromobility data and aggregates it for data privacy standards. With micromobility, there are examples of tracing people—for instance the tracking of a young individual to an abortion clinic. There’s a lot that can be exposed. So these data intermediaries can help companies and governments share their data ethically. In addition, when companies try to share data there can be delays. It takes almost two years to negotiate a license, and by that time the insights might already be out of date.
Maybe we can even think about incentives for private companies to share data, in terms of tax credits or other kinds of let’s say philanthropic returns for companies. At the heart of it, sharing data is not a core part of their business, and it takes a lot of risk on their part to contribute and so why would they do that unless there's maybe a philanthropic kind of mission behind it? So let’s properly incentivize them; we could use more partners.