Near real time flood inundation mapping using social media data as an information source: a case study of 2015 Chennai flood

During and just after flash flood, data regarding water extent and inundation will not be available as the traditional data collection methods fail during disasters. Rapid water extent map is vital for disaster responders to identify the areas of immediate need. Real time data available in social networking sites like Twitter and Facebook is a valuable source of information for response and recovery, if handled in an efficient way. This study proposes a method for mining social media content for generating water inundation mapping at the time of flood. The case of 2015 Chennai flood was considered as the disaster event and 95 water height points with geographical coordinates were derived from social media content posted during the flood. 72 points were within Chennai and based on these points water extent map was generated for the Chennai city by interpolation. The water depth map generated from social media information was validated using the field data. The root mean square error between the actual water height data and extracted social media data was ± 0.3 m. The challenge in using social media data is to filter the messages that have water depth related information from the ample amount of messages posted in social media during disasters. Keyword based query was developed and framed in MySQL to filter messages that have location and water height mentions. The query was validated with tweets collected during the floods that hit Mumbai city in July 2019. The validation results confirm that the query reduces the volume of tweets for manual evaluation and in future will aid in mapping the water extent in near real time at the time of floods.


Introduction
Change in climate, urbanization and other human activities across the globe disturbs the hydrological cycle and cause various water related issues like water pollution, floods, droughts, etc., (Lyu et al. 2019a;Luo et al. 2019Luo et al. , 2020. Especially cities face the problem of uneven distribution of rainfall very often which leads to subsequent urban floods (Lyu et al. 2019b;Zou et al. 2020). Flash flood disasters leave a massive social, environmental and psychological impact on the affected community (Duan et al. 2016). Unanticipated heavy precipitation within a short time span followed by flash floods in urban areas causes a greater loss in terms of lives, infrastructure and properties (Duan et al. 2014). In the past decade, occurrence of urban floods increased drastically across India (Rafiq et al. 2016). Flood extent or water depth details are required immediately after the disaster to identify inundated areas that need quick attention (Blyth 1997). Emergency managers need appropriate and rapid information about severity of flooding for planning rescue and response operations. Information on variations in flood water depth with respect to time and space were required for effective flood risk management (Luo et al. 2018;Mu et al. 2020). The unexpected, quick nature of flood in urban localities due to very intense rainfall Karmegam et al. Geoenviron Disasters (2021) 8:25 restricts getting water depth information during floods. In general inundation map is prepared based on the field data, remote sensing data and hydraulic models (Grimaldi et al. 2016). Field data will be collected by sending field workers to the flooded areas, inspect the highest water mark after floods and based on this inundation maps will be generated. Collecting field data have practical difficulties and fail to provide timely data regarding flood extent. For instance, during Chennai flood 2015, authorized official report on inundation map was released by Disaster Management Support (DMS) Division, National Remote Sensing Centre (NRSC/ISRO), India on March 2016 after a field survey that was carried out on December 24 to 26, whereas flood disaster occurred on December 2 2015 (National Remote Sensing Centre 2015). Utilizing remote sensing data for rapid water extent mapping have some limitations that includes restricted availability (Mason et al. 2012), limited spatial and temporal resolutions(McDougall and Temple-Watts 2012). Apart from these traditional data content, user generated crowd sourced content called volunteered geographical information(VGI) were also widely used for water extent mapping and validation (McDougall 2011;Hirata et al. 2018;Rollason et al. 2018). Geo referenced data from social media like Facebook, Twitter, etc., are also considered as VGI. The role of social media in management of disaster situations was widely researched in past decade (Lindsay 2011;Verma et al. 2011;Cameron et al. 2012;Middleton et al. 2014;Takahashi et al. 2015;Anson et al. 2017). Social media data ascending from the affected population has the potential to aid in creating situation awareness and planning response and rescue operations (Huang and Xiao 2015;Lin et al. 2016;Mart et al. 2017). As the data from social media are posted real time with no time delay, the same can be mined for rapid water inundation mapping.
Although social media was widely used as a tool for information dissemination, early warnings and situational awareness during floods (David et al. 2016;Kaewkitipong et al. 2016;Lin et al. 2016;Yadav and Rahman 2016;Alias et al. 2020), exploring its utilization in water depth mapping was at infancy. Few recent studies investigated the potential of using social media information in inundation mapping (Eilander et al. 2016;Brouwer et al. 2017;Li et al. 2018). Information got from social media about water depths were used in studies to validate the flood extent from other bases and frameworks (Cervone et al. 2016;Smith et al. 2017). Previous researches also examined the possibility of using water logging information in social media along with other sources of information for inundation mapping and risk assessment (Zhang et al. 2016;Rosser et al. 2017;Wu et al. 2018). To the best of our knowledge, the usage of social media content for real time inundation mapping during floods is explored least in Indian context. One of the biggest challenges of utilizing social media data during disasters is huge volume of posts shared by people on different aspects. Extracting useful and required information from the noisy, volumous text becomes a barrier for emergency managers (Hiltz and Kushma 2014). Earlier studies filtered the flood related content based on hash-tags or geographic locations (Lu et al. 2015;Woo et al. 2015;Murzintcev and Cheng 2017), but again segregating inundation information containing messages from the flood related post becomes a laborious task.
In this article, results of a feasibility study to filter and utilize social media content for flood mapping in Indian context was provided. Flooding in the Chennai city (Tamil Nadu), India in December 2015 was one of the worst devastating, unexpected flooding events in India. At the time of Chennai floods, affected people used social networking sites as a communication platform and that helped in identification of people in need. Social media platforms played a major role after floods in rescue and relief operations (Prakash and Anand 2016). Disaster management stakeholders and volunteers utilized social networking sites to connect people in need and people who came forward to offer help (Yadav and Rahman 2016). Hence Flooding event in Chennai 2015 was considered as the case scenario in this study. Flood extent map was generated using social media information on water depths and location posted during Chennai floods 2015 and validated against field data collected after floods. A simple keyword based query was developed to filter social media data that contains water depth information in case of urban floods in India. The developed query was also validated with another disaster scenario.

Study setting
Greater Chennai Corporation (GCC), which is located at the state of Tamil Nadu in India, was considered as our study area. GCC is divided in to fifteen zones, which is further subdivided into 200 wards. Chennai city receives almost 60% of the annual rainfall during north east monsoon period (from October to December). Due to flat topography, some localities in Chennai deal with the problem of poor drainage during monsoons. Chennai experienced severe flooding due to heavy rainfall and the normal life of the population across the city was troubled in 1976, 1985, 1996, 2005 and 2015. That is approximately once in every decade (National Remote Sensing Centre 2015). During 2015, as per meteorological reports, Chennai received very heavy rainfall of 1471.6 mm, far excessive than that it receives usually (915.6 mm-Normal rainfall) during the monsoon between October and December (Indian Meteorological Department (IMD) 2016). City experienced one episode of substantial rainfall in the end of November 2015 about 1049 mm, which filled up all the water bodies and water logging in some low lying areas. Again extremely high intensity of rainfall was recorded on December 1 and 2, 2015 at Nungambakkam and Chembarambakkam rain gauge stations, that flooded the entire city. There was sudden increase in water levels about 6-8 m in many areas across the city on December 1st 2015. In few residential areas water entered in to the houses and reached till first floor and in some localities even up to second floor. The population unaware of this sudden rise in water level were stranded at the terrace without any basic needs like food, water, etc. (National Disaster Management Authority (NDMA) Government of India 2017). As per government reports, around 1.8 million people from various localities were sent to relief camps at the time of flood. As estimated by media, approximately 500 people lost their lives and there was around 200 million Rupees economic losses due to flood (Mujumdar et al. 2016). Figure 1 shows the ward map of GCC (Chennai Corporation 2011) with its zone name shown.

Flood depth mapping
Data from Twitter and Facebook public pages were utilized in this study for getting water depth information. The process followed for generating water extent maps from social media content is provided in Fig. 2.
As the objective is to generate water extent map rapidly in real time, messages posted on Twitter and Facebook public pages between 1 December 2015 and 3 December 2015 were considered. In Facebook, messages shared in public pages only were considered because messages posted by individuals are restricted to public access based on their privacy settings. But the messages shared in the public pages can be used without restriction. The public pages in Facebook created before December 3rd 2015 to share the details about flood situation and rescue activities in Chennai were identified. In Twitter, the hash-tags related to floods 2015 were identified by general look up of the tweets. Twitter messages in English and Tamil with the identified hash-tags and tweets that originated from Chennai (25 miles around Chennai geographical coordinates-search option in Twitter) in the above mentioned time frame were collected. The messages in Twitter and Facebook pages were screened manually to identify the messages that have both location and water depth information regarding the flood situation. The location information from the geo-coded messages was obtained directly from the coordinates specified. In other messages, the location was derived either from the address provided in the text or from the image shared. Then the geographical coordinates of the specified location was derived utilizing using Google Maps. Following location the extraction of location, water depths in these locations needs to be derived. In some messages, the data on water depth was specified directly in feet and meter. An example message where the height of water was specified straight in a particular location is provided below: There were also messages where the water height in particular area was mentioned with reference to some other aspects like up to first-floor, or hip-level, etc. One such example message is given below: "Managed to get out of west mambalam. Water levels we at knee/calf level at arya gowda road till panigraha hall. " Figure 3 shows an example message posted in Facebook, from which location and water height were derived from the text. In the message shown in Fig. 3, the geo-coordinates of the location were extracted based on the address and land mark provided using Google Maps. The water height at that location was mentioned as "up to 1st floor", that which means the water height will be approximately 3 m. Figure 4 shows two images (a and b) posted on social media, from which water height and location were extracted. In the first image (Fig. 4a), location was derived from the text and water height from the image. In the second image (Fig. 4b), both location and water depth were derived from the photo shared. Messages that reveal, a particular locality that had no water logging was also considered as water depth point. Example messages that reveal no inundation are given:

"Our flat in #ValmikiNagar, # Thiruvanmiyur is dry with Internet and electricity. Please get in touch if you need help #ChennaiFloods" "Loyola college, Nungambakkam has accommodation for rain victims. They have electricity"
Once the water height and locations were derived, they were mapped over the base map of Greater Chennai Corporation using Quantum geographical information system (QGIS). The water height points that were outside the geographical extent of the city were excluded. The water inundation map to the extent of Chennai city was generated by interpolating the water height points using Inverse Distance Weighting (IDW) interpolation in QGIS. IDW is one of the commonly used deterministic, spatial interpolation methods in hydrological modelling. IDW interpolation assumes that nearer values are more connected than farther values and this method works best with dense point values in flat zones (Ly et al. 2013).

Fig. 4 Sample images in social media with reference to location and water height
The generated water extent map after interpolation was validated with water height points reported after the field assessment by the Disaster Management Support (DMS) Division, National Remote Sensing Centre (NRSC), India (National Remote Sensing Centre 2015). A field survey was done by DMS, NRSC, along with the Indian Institute of Technology (IIT) Madras on 24th and 25th December 2015 to collect data on water depth marks in the flood-affected areas in and around Chennai. From this report, field information on water depths in twelve locations were used to validate the water height derived from social media data. The water height in those these twelve locations (where actual information on water depth is available) was extracted from the interpolated map generated using social media content. In order to examine the significance in the difference between the averages of two water heights (actual water depth and depth resulting from social media data), t-test was performed. Root mean square error (RMSE) was calculated to understand the error between the actual heights and water heights generated from social media.

Query development
A simple keyword-based query particularly attuned to Indian urban setting was proposed to filter the messages that have both location and inundation information. Based on the experience on manual screening of messages posted during Chennai flood scenario, the keywords which the affected population used to mention the water depths in their messages were identified. Water depth keywords are usually a combination, such as a number followed by metre or feet (5 feet, 10 cm), number followed by floor (for example-2nd floor), or indicative levels like ankle-high and neck-deep. Location information keywords include area and road names in the city. The query was framed in such a way that the messages will be filtered if it contains both the location and inundation keywords. The query was framed in MySQL, an open-source database management system. As the water depth keywords were identified based on the experience with manual screening of Chennai flood data, we validated the filtering query with tweets collected during the floods that hit Mumbai city in July 2019. The tweets related to Mumbai floods in 2019 were collected using search API based on the hash-tags related to Mumbai floods. Areas and road names of Mumbai city was used as the location keywords in that query.

Results
The Facebook pages related to Chennai flood, 2015 and the Twitter search query was given in Additional file 1. On manual screening, we derived 95 points with geographical coordinates and water height from ground level. Figure 5 shows the distribution of water height points over the base map.
The derived water height and location (geographical coordinate) were given in Additional file 2. Among these, 72 points were within the geographical extent of Chennai. The generated water extent map for the entire city by IDW interpolation was given in Fig. 6.
The actual water height from field survey report and water height from interpolated map got from social media at the 12 locations were tabulated in Table 1. As per the results of the t-test, there was no significant difference between mean of the actual data and the extracted data of the water height from social media. The root mean square error between the actual data and the extracted social media data (water height) was 0.3.
The query in MySQL filters the messages, if it contains both water depth and location keywords. The combination of inundation keywords was handled using regular expressions in MySQL. The query was validated using the tweets collected during Mumbai flood and that was 17,846 messages excluding duplicates and re-tweets. When these messages were filtered using the query, it returned 156 tweets. On checking the tweets manually we found that, 102 messages had water depth and location information that will be used for rapid water extent mapping. The query written to filter messages regarding Mumbai flood data was given in Additional file 3. The screen shot of the executed query in MySQL workbench with results was given in Fig. 7. Sample tweets that had location and water height mentions, filtered by the query were given below.
"my college basement was floodedwalked in neck deep water with my colleagues up to lbs marg and then on in waist deep water walked home for nearly an hour and half this is from bhandup to mulund mumbairains" "water raises above 3 feets at cm high school badlapur water has started entering kitchen godown and classrooms destroying uniforms and food. mumbairains badlapur mahalaxmiexpress ndrf mumbairainsliveupdates"

Discussion
This study examined the feasibility of using social media data, for water depth mapping in Indian disaster scenarios. This article is the proof of concept that confirms the potential of utilizing social media information for rapid flood mapping in Indian context for immediate response and recovery.
The water depth map was generated from information in Twitter and Facebook messages, for the case of Chennai flood, 2015. Based on derived water height points, we found that localities in Saidapet, Jafferkhanpet and Ashok nagar had the highest water depth more than 4 m. During flood, Chembarambakam, one of the tanks that supplied water to the city, breached due to unexpected heavy downpour, releasing thousands of cusecs of water in to Adyar River (Mujumdar et al. 2016). As Adyar River flows through the city, the areas lying in close proximity to the river were highly inundated. That might be the reason for very high water depth in above mentioned areas, as they are closely located to the Adyar River. The interpolated water extent map also confirms that the areas close to Adyar River had higher water levels. As per the results, the wards in Kodambakkam (ward 142) and Adyar (ward 171) zones had water height greater than 6 m. Both these ward are located in the bank of the river. These areas were also reported among the worst affected areas during floods (National Remote Sensing Centre 2015). We found that the error between the field data and social media data was about ± 30 cm. This can be acceptable as this map was generated rapidly in real time at the time of disaster, when no other source of data was available. The main advantage of using social media data in emergency situations is that they are available at near real time and also from the affected population, who are the eye witnesses of the disaster situation (Fohringer et al. 2015). The query that filters messages based on keywords reduces the volume of messages considerably for manual screening. This handles the problem of information overload and aids in creating the flood map in time.
We believe that if this query is applied to the streaming online data in future during floods in India, this will provide flood extent or inundation information without delay for emergency management. This will aid in planning the rescue operations in accordance with the need of affected population.
As the water depth and locations are extracted manually with approximations, there is a possibility of inaccuracies in data. But at emergencies, when the information on flood is very scarce, this information will definitely give some understandings towards the flood situation. In case of Chennai floods, some of the localities had power cuts and network issues. This prevented the affected population from updating the flood status in social media. So there is also a possibility that updates from highly inundated and affected areas may be limited in social media. Instead using social media information as standalone information source, it can be used along with other sources like remote sensing data and already available Digital Elevation model to fill the information gap during crisis.
Previous researches also mention that there is a possibility of uncertainties regarding location information in social media posts (Brouwer et al. 2017;Ogie and Forehead 2018). Development of national scale integrated framework to fuse multiple data sources (social media, remote sensing, topographic and environmental data) in real time by duly taking into account uncertainties in data sources for the purpose of generating precise real-time

Conclusion
This study presented a method and keyword based query to filter messages from social media that support extraction of water height information for near real time flood inundation mapping during urban floods in India. The results of the application circumstance Chennai flood, 2015 was positive. The advantage of proposed methodology to use social media information for mapping is the rapid availability of data when compared to other traditional sources of information like remote sensing, satellite data, etc., particularly in urban setting. In future, during floods this rapid flood map in real time will improve situational awareness and aid in efficient flood management. Social media information on water heights will close the information openings in traditional information sources. In future mapping framework and tool can be developed that automatically derive information from social media by text and image analysis and integrating with other sources of information available to acquire a more accurate inundation maps in real time.