Big data analytics to identify and overcome scaling limitations to climate-smart agricultural practices in South Asia (BigData2CSA)

Hamidul Islam- a farmer, harvesting wheat from fields in Rahamanbari Union,Barisal District in Bangadesh. Photo: Ranik Martin (independent consultant)

Heterogeneity in soils, hydrology, climate, and rapid changes in rural economies including fluctuating prices, aging and declining labor forces, agricultural feminization, and uneven market access are among the many factors that constrain climate-smart agriculture (CSA) in South Asia's cereal-based farming systems. Most previous research on CSA has however employed manipulative experiments analyzing limited agronomic variables, or survey data from project-driven initiatives. This can obscure the identification of relevant factors limiting CSA, at times leading to inappropriate extension, policy, and inadequate institutional alignments to address and overcome constraints. Alternative big data and data mining approaches utilizing heterogeneous datasets, however, remain insufficiently explored, though they can represent a powerful alternative source of technology and management practice performance information. In partnership with national research and extension systems (NARES) and the private sector, this project responds by developing systems to rapidly collect, process, analyze and interpret a wide variety of primary agronomic management and socioeconomic data from tens of thousands farmers. Fusing data with spatially-explicit soils and hydrological datasets, remote sensing, and gridded climate products, location-, age- and gender-specific factors contributing to or limiting CSA indicators (yield, profitability, GHG emission intensity, resilience) will be identified and represented through interactive web-based dashboards. Alignment with bilaterals and established institutional partnerships will assist in digitally reaching 500,000 farmers with customized management advice on CSA. Alongside the collaborative development of analytical tools, we expect these processes to be institutionalized by next-users, with research affecting agricultural policy and development decisions to enable the improved application of CSA.

Project Activities

In order to assure that the analytical procedures, tools, and outputs developed by the project will be sustained by partners to identify and develop strategies to overcome limitations to CSA, this activity focusses on the following sequence of actions: (1) Completion of at least one high-level policy event with participation of key national institutions presenting results from the previous two years of work (separate events in each country to provide partners with direct opportunity to candidly discuss results and develop policy investment targets to scale CSA. (2) Concerted and ongoing efforts with partners to cooperatively develop and translate crop management advisories in actionable format and deliver digital recommendations on CSA for 500,000 farmers. (3) Awareness-raising among bilateral donors on big data analytics to identify and overcome limitations to CSA scaling. A policy brief will result from this activity.

Using the project inception phase consultations and workshop insights, improved crop monitoring, digital management surveys and spatial sampling approaches will be designed and implemented with partners responsible for crop cut networks in India and Bangladesh, with pilot approaches tested in Nepal (see Partner descriptions). Activities will involve partners in a series of capacity building activities to improve digital data collection and involve them the process of digital survey design, administration, and refinement. Partners will also be engaged to design and develop cloud-based data storage, data cleaning, capacitated in big-data statistical analytical procedures. Research will also benefit from iterative partner feedback that, when combined with data dimensionality reduction, will over time result in simpler, more effective and scalable surveys. Combining capacity development with the collaborative design of dashboards, alongside efforts to demonstrate their CSA policy relevance, partners and next-users will be empowered to sustain research tools and recommendations after 2021.

Building on the primary data collected in activity 17014, secondary data from spatial databases will be combined with remote sensing derived data to round-out datasets for machine learning and/or multiple regression analysis when combined with primary data. IVR / telephone options for the collection of supplementary data will also be explored with PAD. We will transfer data to a customized cloud environment for meta-data extraction, curation and processing to prepare for subsequent analysis. Fusing secondary data with primary data, we will apply machine learning and/or multiple regression to identify factors most relevant to CSA in consideration of South Asia's environmental and socioeconomic heterogeneity in the subsequent Activity 17018.

Primary and secondary data collected in previous activities will be analyzed using machine learning and/or multiple regression (and/or other potential approaches) to identify factors most relevant to CSA performance (yield, profitability, resilience, modeled GHG emissions assessed with the CCAFS Mitigation Options tool) in consideration of South Asia's environmental and socioeconomic heterogeneity. Machine learning will be used to reduce data dimensionality (survey questions and size) and identify data need to be regularly collected through digital surveys. This process is anticipated to develop smarter, more efficient, and effective crop performance and management data collection processes, analytics, and data representation tools. Results will be made available to partners through interactive dashboards collaboratively with partners developed in this activity. Dashboards will update after the cropping season to represent factors influencing yield, profitability, and GHG emissions intensity estimates, with disaggregation of data by gender and age.

The alignment of institutions to make use of research tools is a key requirement to scale-out the results of this project. The project had initially planned a launch workshop in 2019, but CCAFS funds arrived several months late. This forced a readjustment of priorities. Activities had already begun with partners, and given the seasonality of work, we decided to focus on deploying fieldwork, data collection, and analysis. As such, the workshop date and purpose have been shifted to mid-2020 as a review of activities, research results, and activities conducted so far by the project across countries. Preliminary results from management practice surveys in Nepal, India, and Bangladesh will be presented, with partners positioned to discuss the interpretation of results and generation of advisories to encourage CSA mainstreaming. The workshop will be documented in a CCAFS report released in 2020.

Project Deliverables