#HackInSAS: Utilizing AI/ML and Data Science to Address Today’s Challenges
In April, EPAM’s data science experts participated in SAS’s 2023 Global Hackathon and received multiple awards: as a global technology winner for natural language processing (NLP) and as a regional winner for EMEA. With global participation from the brightest data scientists and technology enthusiasts, SAS hackathons look to tackle some of the most challenging, real-world business and humanitarian issues by applying data science, AI and open source cloud solutions. Teams have just one month to define a problem, collect the data and deliver a POC. This year, two EPAM teams participated in the event:
- To help disaster relief responders gain valuable insights and better direct their actions during a natural disaster, EPAM’s first team developed a tool to mine and categorize social media requests using NLP. By analyzing satellite imagery (before and after) using the latest computer vision (CV) techniques, the solution helps determine the extent of infrastructure damage to support relief planning optimization.
- EPAM’s second team created a digital twin of the city of Heidelberg, Germany to predict traffic flow, improve safety and aid planning. It uses data from IoT monitoring devices while considering the impact of weather and popular events in the city.
Let’s dive deeper into these two exciting innovations…
Social Listening for Support Services in Case of Disasters
EPAM Senior Data Scientists Leonardo Iheme and Can Tosun partnered with Linktera to create a tool to help decision makers in disaster relief coordination centers make data-driven, informed decisions. The solution harnesses the power of NLP and image analysis to turn the disruption of a natural disaster into actionable insights for coordination centers.
Just prior to the hackathon, Turkey’s magnitude 4.9 earthquake struck resulting in serious questions about how to improve response to natural disasters. We wanted to help and worked with Linktera in Turkey to do just that. The goal is to streamline the decision-making process by providing data-backed decisions, so that resources can be allocated effectively for rapid response to critical situations.
Mining, Categorizing & Validating Social Media with NLP
This concept mines social media platforms, like Twitter, for real-time data, transforming this wealth of information into insights for disaster relief. In today’s connected world, social media has become an essential tool for communication, where people share their experiences, seek help and stay informed. The system features advanced algorithms to filter out misinformation and validate crucial details. The goal is to not only provide verified information to coordination centers but also paint a clearer picture of the situation on the ground.
Applying NLP, we analyzed 140,000 tweets from Turkey’s earthquake to identify the intent of help requests and classify them into relevant categories. To pinpoint the location of those in need, we used named entity recognition to extract addresses from tweets. We then used the Google Maps API to convert these textual descriptions into precise coordinates for mapping.
Assessing Infrastructure Damage with Machine Learning
After the earthquake, satellite image providers quickly made their highly valuable resources publicly available. This is a vital data source to validate and enrich complex social media data, which can help to understand the full extent of the disaster, the infrastructure damage, areas impacted, collapsed buildings and blocked routes that would otherwise hinder emergency response. Using advanced CV techniques, we compared satellite imagery before and after the earthquake. This methodology involves precise location identification, preprocessing of the satellite images and identifying damaged structures using advanced object detection and segmentation models.
To identify buildings within the geospatial data, we utilized a pretrained, deep learning object detection model, like the open source YOLO V5 architecture. This offers high accuracy and efficiency in detecting structures. Additionally, the team leveraged the latest segmentation model from Meta AI to delineate buildings and assess the degree of damage. This empowers stakeholders to make informed decisions with information from our satellite image processing that displays the locations and percentage of damage to detect buildings and identify blocked roads.
Building Data Visualizations for Disaster Relief
You need the right tools to make sense of the data. The PoC has a comprehensive and user-friendly dashboard for the disaster coordination centers to streamline their decision-making process and facilitate effective communication among various teams. The dashboard gathers data from the NLP and image analysis techniques and aggregates it into a single platform with an interactive map that pinpoints the locations of the affected areas, specific requirements and relevant labels to allow decision makers to quickly assess and prioritize their response. It also includes a layer dedicated to satellite imagery to visualize and assess the extent of the damage.
We hope that this design concept extends beyond Turkey’s earthquake to all natural disaster relief efforts.
Mobility Insights Heidelberg – A Digital Twin to Model Urban Traffic Flow
I joined EPAM Data Scientists Denis Ryzhov and Dzmitry Vabishchewich, alongside Digital Agentur Heidelberg and Fujitsu, to develop a digital twin of Heidelberg, Germany. Heidelberg is a popular tourist destination, which attracts 14 million visitors annually, almost 100 times its population. Predicting traffic and pedestrian flow by using data from IoT monitoring devices while considering the impact of weather is key to run the city smoothly and effectively. These predictions can enhance tourist experiences and safety, prevent accidents, improve planning for road closures for major events and help with future city planning and development. This is an active area of interest among many cities but don’t have their own data science and technology experience in-house to accomplish this task alone.
Impacting Weather on Traffic Flow
For the first modeling initiative, the team wanted to understand and predict how weather patterns will impact the flow of traffic in the city. IoT sensor data included a central traffic light control system, cameras on traffic lights, parking garage sensors, bicycle count sensors and pedestrian count sensors. The team used time, weather and city event data to generate a decision tree and further improve the model by partitioning and using gradient boosting.
Predicting Parking Space Availability
For the team’s second modeling initiative, the goal was to add a predictive parking availability function to guide motorists to parking spaces. By the end of the hackathon, the models were available on the city’s website to provide long-term prediction of parking availability. It was modeled in parallel with the Python and SAS model builder using random forests for modeling. The short-term model learns patterns of occupancy from lag features. Further improvement was achieved by extrapolating a curve in response to unprecedented patterns. The model was further improved when weather and event data were added.
Forecasting Short-Term Traffic
For the third modeling initiative, the team looked at traffic flows in the city and generated short-term traffic predictions for the city to forecast traffic within the next three hours. The team performed a successful experiment in creating a model to predict traffic at each sensor location within the city, demonstrating the possibility of attaining high quality models from multiple locations. This stream was particularly challenging due to some of the gaps in the data, which we overcame by carefully selecting the analysis tools and techniques and filling gaps where necessary. The team used a light gradient boosting machine (LGBM) tree-based model for time series problems, so lagging features and rolling windows statistics are the best and quickest model to implement.
EPAM’s proud of our hackathon teams and their award recognition from SAS. The teams delivered exciting PoC innovations using the latest AI technologies and data science practices to deliver on today’s most challenging, real-world business and humanitarian issues. As always, we hope to inspire and foster technology innovations, and look forward to another great competition in next year’s #HackInSAS.