1. Introduction
According to the World Health Organization’s global report on road safety, 1.19 million people died from road traffic incidents worldwide in 2021, with pedestrians accounting for 23% of these fatalities (Global Status Report on Road Safety 2023 2023). Pedestrian vulnerability is particularly pronounced among individuals with disabilities, including those with non-visible cognitive or sensory conditions such as autism spectrum disorder (ASD) (Xiang et al. 2006; Organization 2019; Schwartz et al. 2022). While human drivers may yield to individuals displaying visible disabilities—such as those carrying white canes to indicate visual impairment (Andrew Harrell 1992; Guth et al. 2005)—those with invisible disabilities, like ASD, often face greater challenges and risks due to the absence of externally observable cues (“Going Our Own Way: Public Transit Accessibility for Neurodivergent People” 2023).
ASD is a lifelong neurodevelopmental condition characterized by difficulties in social communication and interaction, as well as sensory sensitivities and restricted behaviors (Diagnostic and Statistical Manual of Mental Disorders: DSM-5 2013; Lord et al. 2018). Its prevalence has sharply increased, with current estimates suggesting that at least 1 in 100 children globally are diagnosed with autism (Zeidan et al. 2022; Talantseva et al. 2023). Despite a growing body of research, studies on autistic individuals in traffic contexts have mainly focused on driving behavior (D. Y. Chee et al. 2017; D. Y. T. Chee et al. 2019), leaving a gap in understanding their behavior as pedestrians—a particularly vulnerable road user group (Earl et al. 2016; Earl et al. 2018; Wilmut and Purcell 2021).
Pedestrian interactions, especially at unsignalized crossings, heavily rely on non-verbal communication, including eye contact, body posture, and subtle gestures that help negotiate right-of-way (Hamilton-Baillie 2004; Bishop, Biasini, and Stavrinos 2017).
This process, referred to in cognitive literature as theory of mind (Baron-Cohen, Leslie, and Frith 1985), intention attribution (Dennett 1971), or mentalizing (Frith 2001), requires quick and accurate interpretation of another’s intentions.
The temporoparietal junction (TPJ) is critically involved in such processes (R. Saxe and Kanwisher 2003; Schurz et al. 2014) , and atypical TPJ activity has been repeatedly observed in individuals with ASD (Lombardo et al. 2011; Nijhof et al. 2018), potentially impacting their ability to navigate social negotiation in dynamic environments.
With the rise of autonomous vehicles (AVs), these traditional interaction dynamics are set to change. The AV market is projected to expand to $21 billion by 2035, leading to increasing integration with human road users (Rezwana and Lownes 2024). AVs do not communicate via human social cues, potentially reducing ambiguity for those who struggle with traditional social signaling (Wilmut and Purcell 2021; Rezwana and Lownes 2024). While typically developing individuals might experience confusion in the absence of human driver cues (Van Brummelen et al. 2018), individuals with ASD may benefit from the structured, rule-based behavior of AVs. Autistic drivers, for example, have been shown to outperform NT counterparts in tasks that rely on consistency and predictability (D. Y. Chee et al. 2017).
This opens the possibility that autistic pedestrians may also be better attuned to inferring intent from the motion kinematics of AVs—such as changes in speed or trajectory—rather than social cues (Zach Noonan et al. 2023). Research supports the idea that ASD individuals possess a unique affinity for systematic and technological environments, often demonstrating strengths in interaction with robotic agents and predictable digital interfaces (Diehl et al. 2012; Scassellati, Admoni, and Matarić 2012; Dubois-Sage et al. 2024).
These capabilities may reflect a different but equally valid form of ‘machine theory of mind’ (P. Pantelis et al. 2011; P. C. Pantelis et al. 2014), where ASD individuals excel in interpreting motion-based social cues rather than anthropomorphic ones. In fact, ASD participants have been shown to respond more accurately to robot-initiated actions than human ones in imitation tasks (Pierno et al. 2008). This suggests that AVs, as predictable and less ambiguous agents, may offer a communication context that aligns more closely with autistic cognition.
Given the increasing integration of AVs into public infrastructure, it becomes essential to understand how neurodiverse individuals interpret AV behavior. While current AV design often incorporates social or symbolic cues to improve human comprehension (Driggs-Campbell and Bajcsy 2016), few studies consider how such signals are processed differently across populations. This study addresses that gap by investigating behavioral, physiological, and neural responses of autistic and neurotypical adults as they interact with AVs in a simulated road-crossing scenario.
To achieve ecological validity and capture multimodal signals, we employed functional near-infrared spectroscopy (fNIRS) (Villringer and Chance 1997), eye-tracking, and pupillometry. fNIRS has proven effective in detecting social brain responses in ASD individuals (Liu et al. 2019; Zhang and Roeyers 2019), and its portability makes it ideal for mobility studies involving real-time interaction in simulated or naturalistic environments.
By examining how participants respond to AV behavior across different traffic conditions, we aim to advance our understanding of neurodiverse pedestrian safety and inform the design of more inclusive AV systems.
2. Methods
2.1 Participants
A total of 53 participants took part in the study, including 12 adults with autism spectrum disorder (ASD group) and 41 neurotypical adults (NT group). All participants had normal or corrected-to-normal vision and provided informed written consent prior to participation. The study was approved by the ethics committee of the Université de Lyon.
Participants in the ASD group (9 females, 3 males; mean age = 37.6 years, SD = 14.2) were recruited through autism-related communities and forums. All individuals self-identified as autistic and reported having received a formal diagnosis of autism spectrum disorder (ASD) from a healthcare professional, with no history of attention deficit with or without hyperactivity (ADHD). While clinical documentation was not independently verified, participants’ autistic characteristics were further assessed using the Aspie Quiz (https://rdos.net/fr/), a widely used self-report instrument designed to profile neurodivergent and neurotypical traits. The questionnaire provides separate scores for autistic and neurotypical dimensions (range: 0–200). All included participants showed high neurodivergent scores (mean = 136, SD = 20.4) and low neurotypical scores (mean = 67.4, SD = 21.9), in line with typical profiles observed in diagnosed autistic populations (Ekblad 2013).
Participants in the NT group (31 females, 10 males; mean age = 20.6 years, SD = 1.6) were undergraduate students recruited from the Université de Lyon. Although no autism-specific screening instrument was administered to this group, participants reported no history of developmental or psychiatric disorders.
2.2 Apparatus and Procedure
Participants performed a road-crossing task in a virtual 3D pedestrian simulator presented on a PC, using a Gamepad for realistic movement control, being able to move and view in limited range (see Figure 1).
This task was designed to investigate participants’ behavioral strategies and physiological responses during interactions with autonomous vehicles (AVs). The experimental environment consisted of a gently curved road, providing a clear line-of-sight for participants, along which virtual AVs approached continuously. Traffic scenarios were presented in three distinct experimental conditions, each differing systematically in how the available physical crossing gap evolved over time, while maintaining an identical temporal crossing window of 3 seconds at the crossing point (illustrated in Figure 2).
Being explicitly informed that the vehicles were all fully-automated, participants encountered three sequential inter-group gaps (Gap #1 to Gap #3) in each trial, simulating recurring crossing opportunities. They were instructed to cross as soon as possible, reflecting real-world pedestrian decision-making behaviors. However, only those trials in which participants initiated crossing during the first gap (Gap #1) were retained for subsequent analyses. The sequence of traffic conditions was randomized between trials to mitigate potential learning effects or carryover biases.
Eye-tracking data were captured at a sampling rate of 1000 Hz using an Eyelink Portable Duo eye tracker in remote tracking mode. Recorded metrics included gaze positions, fixation duration, and pupil diameter changes, precisely synchronized with key traffic events (e.g., the onset of the initial inter-group gap).
Pupillometry has been widely used to assess cognitive load and emotional arousal, making it a suitable tool to examine how traffic conditions translated on participants’ workload in different group. To account for the physiological delay in pupil dilation responses to cognitive load, we applied a 1000 milliseconds offset when extracting pupil size data relative to the onset of crossing decision-making. A 0.5s baseline sliding window was used for normalization.
Concurrently, cerebral hemodynamic responses were monitored using a fNIRS system with a 12×10 optode array covering frontal, temporal and parietal cortical regions, sampling at approximately 5.5 Hz (see Figure 3). fNIRS signals were acquired using an 26-channel continuous wave fNIRS system (Cortivision Photo Cap C20).
The placement of fNIRS optodes was determined using a systematic and refined procedure based on an automated meta-analytic approach (Yarkoni et al. 2011). Specifically, we conducted an automated meta-analysis via the Neurosynth database, targeting brain regions consistently activated during tasks involving decision-making and Theory of Mind (ToM). Using the “fNIRS Optodes’ Location Decider” (fOLD) toolbox (Zimeo Morais, Balardin, and Sato 2018), we mapped these identified activation clusters onto the international 10–10 EEG electrode positioning standard. The epoch was determined as [-5s,30s] with the moment when the first vehicle entered onto the 200m road as the t=0 reference in time course analysis.
The simulation environment was developed using Unity 3D, with precise synchronization between the virtual environment, eye-tracking data, and fNIRS recordings achieved via the Lab Streaming Layer (LSL) protocol(“LabStreamingLayer’s Documentation Labstreaminglayer 1.13 Documentation,” n.d.). Prior to experimental trials, each participant underwent individual calibrations for both eye-tracking and fNIRS systems, followed by a brief familiarization phase for vibration feedback if collision happens. The entire experimental session, including calibration, data collection, and participant debriefing, lasted approximately 45 minutes. Each participants went through 6 trials per conditions, resulting in 24 trials per participant.
2.3 Measures
3.1.1 Success Rate: Binary indicator of collision avoidance in each trial
3.1.2 Crossing Start Time: Time from trial onset to engaging with potential collision area
3.1.3 Crossing duration: Time spent in crossing
3.1.4 Current Deviation: Time difference between pedestrian and gap center trajectories
3.1.5 Pedestrian Crossing Speed: Speed of the pedestrian avatar in meter per second
3.1.6 Temporal Error: Temporal offset from ideal crossing moment
3.2.1 Gaze Preference: Standardized gaze position projected between gap-related vehicles
3.2.1 Pupil Dilatation: Proxy for autonomic arousal
3.3.1 Channel-level activation Proxy for cortex activation
3.3.2. Permutation cluster test Activation on time course
2.4 Data Analysis
Linear mixed-effects models assessed group and condition effects across behavioral and physiological variables. Eye-tracking data were processed via dynamic dot-product alignment to AVs’ screen positions (see Figure 4).
fNIRS data preprocessing was completed using the MNE package in Python. The raw intensity data were first converted to optical density (OD) changes, then underwent TDDR correction (Fishburn et al. 2019) and bandpass filtering (0.01–0.09 Hz) to filter out motion and other physiological artefacts. Finally the OD data were converted to hemodynamic responses (HDR) using the modified Beer-Lambert Law (Kocsis, Herman, and Eke 2006). The oxygenated hemoglobin (HbO) data of six trials under Baseline traffic condition were averaged for each participant and then used for steady-state control and normalized by z-score methods to eliminate the effect of data units and facilitate comparison between different traffic conditions. Finally, each channel’s HbO amplitude under the six trials of each participant in the three different traffic conditions (Const, VarMinus, VarPlus) was calculated to compare the effect of conditions and groups (ASD, NT).
3. Results
3.1 Behavioral Outcomes
3.1.1 Success Rate:
A linear mixed-effect model showed that the ASD group demonstrated a significantly higher likelihood of collisions compared to the NT group, as indicated by a positive fixed effect estimate for the group comparison (β = 0.206, SE = 0.086, t(51) = 2.379, p = .021).
Notably, in the VarPlus condition, both groups showed a numerical reduction in risk-taking behaviors such as crossing during intra-group gaps, compared to Const and VarMinus conditions. However, this reduction did not reach statistical significance in either group (ASD: z = -1.29, p = .20; NT: z = -0.99, p = .32), suggesting a trend toward more cautious behavior in the presence of physically expanding crossing gaps.
3.1.2 Crossing Start Time:
We examined the effects of traffic condition and group on pedestrian crossing start time using a two-way mixed-design ANOVA with Condition as a within-subjects factor and Group as a between-subjects factor.
Crossing start time revealed a significant main effect on Condition (F(2,96) = 9085.91, p < .001, ηp2=.995), indicating that crossing decisions varied strongly across traffic conditions. Post-hoc tests on Condition showed that the difference majorly came from VarPlus (see Figure 6).
The Group effect was also significant (F(1,48) = 5.997, p = .018, ηp2 = .111), showing that overall, ASD participants started to cross later than NT participants across conditions.
To evaluate potential group differences in motor execution during the road-crossing task, we analyzed also the Crossing duration that participants spent within the potential collision zone—the centered area in crossing, superposed with the trajectory of the approaching vehicles.
3.1.3 Crossing Duration:
Notably, we found no significant group differences in crossing duration (F(1, 669) = 0, p = .99), suggesting intact motor execution in ASD group, once started crossing.
3.1.4 Current Deviation:
We examined current deviation—a behavioral measure reflecting whether pedestrians were adjusting their speed appropriately relative to vehicle positions—using a mixed-design ANOVA with Group as a between-subjects factor and Condition and Time (–5 s to 0 s relative to crossing midpoint) as within-subjects factors. The results revealed no significant main effect of Group ( F(1, 6) = .007, p = .934), no significant main effect of Condition (F(2, 12) = .693, p = .519), and no significant Group × Condition interaction (F(2, 12) = .133, p = .877).
Descriptively, some group- and condition-level differences were observed (see Figure 7), but these did not reach statistical significance. One potential reason for the null results lies in the data quality: the current deviation metric is calculated based on the instantaneous spatial relation between pedestrians and approaching vehicles. In our simulation, vehicle position data were updated at a resolution of 4-meter intervals, which introduced temporal imprecision and resulted in sparse or missing current deviation estimates in many trials, especially in ASD group where we already have limited number of participants.
To circumvent this limitation, we further analyzed pedestrian crossing speed, which does not depend on vehicle position data and thus offers a more continuous behavioral time course.
3.1.5 Pedestrian Crossing Speed:
The pedestrian crossing speed could be interpreted as the slope factor in Figure 7. A mixed-design ANOVA on pedestrian speed revealed a significant main effect of Condition, F(2, 94) = 3.43, p = .036, ηp2=0.068, suggesting that, despite the limitations in current deviation measures, time-resolved modulations of movement speed captured condition-dependent strategies in crossing decisions. follow-up pairwise comparisons did not yield statistically significant differences.
3.1.6 Temporal Error:
The temporal error represented the deviation from the ideal mid-gap. For our 3-second gaps, the ideal mid-gap would be at 1.5-second. A mixed-design ANOVA revealed significant main effects of Condition (F(2,96) = 19.323, p < .001, ηp2 = .287) and of Group (F(1,48) = 6.226, p = .016, ηp2 = .115). Figure 8 showed that ASD participants crossed the mid-point significantly later than NT participants. The result was congruent with our previous finding on Crossing Start Time stating that ASD participants started later their crossing.
3.2 Eye-Tracking and Pupillometry
3.2.1 Gaze Preference:
Since the gaze data were sparse, we used a mixed linear model on gaze projections. With Subject as the random effect, Group, Condition, and Time as fixed-effect variables, we found a main effect of Condition (F(2,40.33) = 9.297, p < .01). However, no significant main effect of Group emerged, although descriptive visualizations (see Figure 10) suggested subtle group-level differences in attentional allocation patterns.
3.2.1 Pupil Dilatation:
Pupil dilation responses indicated reduced sensitivity in ASD participants across conditions, aligning with hypo-arousal theories (Zhao, Liu, and Wei 2022). Under the VarPlus condition, however, ASD participants showed significantly increased pupil dilation, reflecting heightened cognitive demand and higher sensitivity to the manipulated variable in VarPlus condition (i.e. absolute speed variability).
3.3 fNIRS Findings
As HbO has been shown to correlate better with blood flow compared to HbR and total hemoglobin values (Hoshi, Kobayashi, and Tamura 2001), we focused on HbO signals for analysis. We performed two complementary statistical analyses. First, a time-averaged channel-level comparison assessed the overall activation difference between groups for each channel, regardless of temporal dynamics. Second, a time-resolved cluster-based permutation test was conducted to identify whether group differences occurred as contiguous temporal clusters within each channel under a specific condition.
3.3.1 Channel-level activation
For each subject, and for each channel, we computed the contrast value as the GLM beta estimate of each Condition of interest (Const, VarMinus, VarPlus) minus the Baseline condition data. These contrast values were then entered into a linear mixed-effects model with group, three contrasted condition, and channels as fixed effects, and subject as a random effect. The model revealed a significant main effect of Group (F(1,52.7) = 6.10, p = 0.017), indicating that ASD and NT participants showed different mean activation contrasts across all channels and conditions. No significant main effect of Condition or three-way interactions were observed. The Group × Channel interaction was significant (F(24,2978) = 3.41, p < 0.001), highlighting spatial heterogeneity in group differences. Post-hoc examination of fixed effects showed that channels Fz_F1 (frontal, p = 0.00318) and F7_FT7 (left temporal; p = 0.0022) exhibited especially pronounced group differences.
Pairwise t-tests across all channels identified a broader set of significant group differences (see Figure 12). The significant or marginal group differences were mapped onto the initial layout (see Figure 13 ).
In Figure 12, we found channel Fz_F1 showing that ASD have significantly less HbO in frontal area; while the channel F7_FT7 showing the opposite pattern, ASD having more HbO in the left temporal area.
ASD participants in general exhibited lower HbO compared to NT across most channels, especially in the frontal region (e.g. Fz_F1). In contrast, increased HbO was observed in the ASD group relative to NT in the left temporal area (i.e. F7_FT7)
Fz_F1, F7_FT7) where robust group effects were also confirmed using linear mixed-effects modeling.
Only double-confirmed effects (Fz_F1 in the frontal area, and F7_FT7 in left temporal) are interpreted as robust findings; additional significant channels identified by t-tests alone are considered exploratory.
3.3.2. Permutation cluster test
To identify group-level differences in time courses, we conducted a cluster-based permutation test on HbO signals across conditions. The test was applied to each channel and each condition individually, using participant-level HbO time courses as input.
Cluster-based permutation testing revealed significant (p < .05, cluster-corrected) group differences under VarPlus condition at frontal channel AF4_F2 and left parietal channel CP5_P5. ASD participants exhibited increased HbO signals in these channels prior to crossing attempts, suggesting the cognitive differences were determined even before crossing. ASD pedestrians showed a differential engagement in neural processing than NT pedestrians under VarPlus condition.
4. Discussion
Our multimodal findings illuminate distinctive cognitive responses from ASD pedestrians during interactions with autonomous vehicles, resonating strongly with predictive coding models of autism (Van de Cruys et al. 2014). The heightened hesitation (i.e. Crossing Start Time) and sensitivity to speed variability (i.e. specific performance under VarPlus) observed in ASD participants likely arise from atypical sensory processing and decision-making processes, rather than deficits in motor execution.
The reduced HbO activation in frontal areas coupled with increased temporal and parietal activations in ASD could reflect diminished engagement of regions involved in rapid social inference and executive control (frontal cortex) alongside enhanced reliance on regions implicated in sensory integration and motion perception (temporal and parietal cortices). The increased pupil dilation specifically under VarPlus conditions further corroborates increased cognitive load and autonomic arousal (Zadok et al. 2024) associated with unpredictability and sensory uncertainty.
Hemodynamic response data from fNIRS further enriched our understanding by revealing condition-specific cortical activation patterns. Reduced frontal cortex activation (channel Fz_F1) in the ASD group potentially indicates diminished reliance on executive control regions typically engaged during rapid social inference or cognitive conflict resolution.
Conversely, enhanced activations observed in temporal (F7_FT7) and parietal areas (CP5_P5) are particularly intriguing, as these regions are known to play critical roles in social perception, motion prediction, and mentalizing processes (Rebecca Saxe and Powell 2006; Van Overwalle and Baetens 2009). This distinct activation profile suggests that ASD individuals may engage alternative neural pathways, possibly compensatory mechanisms, to process dynamic social cues and motion-related environmental changes presented by AVs.
The permutation-based cluster analysis revealed temporally specific neural activation differences in frontal (AF4_F2) and parietal (CP5_P5) regions before crossing onset in the VarPlus condition. These early divergences suggest that autistic participants may allocate more anticipatory resources when faced with dynamically changing environments. Such a pattern is consistent with theories of altered sensory precision in autism, implying that under uncertainty, autistic individuals may engage more cortical processing to integrate sensory cues and guide decision-making.
These findings collectively underscore the importance of designing AV communication systems that explicitly account for neurodiverse cognitive and perceptual processing styles. Incorporating clear, explicit, and dynamically adaptive signals into AV-pedestrian interactions can significantly improve safety and inclusivity. Our study also emphasizes the potential utility of multimodal physiological signals, such as fNIRS and pupillometry, as biomarkers for adaptive AV systems, enhancing responsiveness to the varied cognitive strategies employed by different pedestrian populations.
However, several limitations should be acknowledged, including modest statistical power due to the smaller ASD sample, and age discrepancies between groups. Future research with larger, better-matched samples and refined measures (e.g., higher temporal resolution for vehicle position tracking) is essential to strengthen and generalize these findings.
Ultimately, our multimodal approach reveals that the distinct behaviors and neural responses observed in autistic pedestrians are predominantly driven by unique sensory processing and decision-making characteristics rather than motor deficits. This highlights critical considerations for creating inclusive AV technology, facilitating safer interactions for neurodiverse populations in urban mobility environments.
5. Conclusion
Autistic individuals exhibit distinct behavioral and cognitive patterns in crossing decisions involving AVs. Their challenges do not stem from motor execution but from upstream decision processes tied to sensory processing, social perception and arousal. Designing AV systems that accommodate such differences can promote safer, more inclusive mobility ecosystems.
Declaration of Competing Interest
The authors declare no competing interests.
Funding
This research was supported by the a public funding from Délégation à la sécurité routière (DSR) (Grant no.xxxxx) for Jordan Navarro.
Declaration of Generative AI in the Writing Process
During the preparation of this article, ChatGPT was used to assist with language refinement and formatting. The final content was reviewed and verified by the authors.
Data Availability
Data underlying this article will be made available upon request due to privacy constraints regarding sensitive participant information.
References
Reuse
Citation
@online{huang2025,
author = {Huang, Wenjie and Gaujoux, Vivien and Fournel, Arnaud and
Cegarra, Julien and Reynaud, Emanuelle and Navarro, Jordan},
title = {Mind the {Gap:} {How} {Autistic} {Pedestrians} {Interact}
with {Autonomous} {Vehicles}},
volume = {x},
number = {x},
date = {2025-05-23},
doi = {xxxxx},
langid = {en},
abstract = {The rise of autonomous vehicles (AVs) necessitates a
deeper understanding of how diverse populations, particularly those
with atypical cognitive profiles, perceive and interact with these
systems. This study assessed behavioral, physiological, and neural
differences between autistic (ASD) and neurotypical (NT) adults
during a 3D simulated road-crossing task involving AVs. Using fNIRS,
eye-tracking, and pupillometry, we examined how participants
interpreted dynamic AV behavior under three conditions: constant,
decreasing, and increasing physical crossing opportunities. Autistic
participants showed delayed crossing initiation and higher collision
risk, despite similar crossing durations. Gaze patterns were
condition-dependent but not group-dependent, and pupil dilation
indicated lower baseline arousal in ASD, with a specific increase
under dynamic uncertainty. fNIRS revealed decreased frontal
activation but elevated temporal and parietal activation in ASD,
with significant group divergences in HbO time courses emerging
before crossing onset under the VarPlus condition. These comparative
findings suggest distinct cognitive processing when interpreting AV
behavior in crossing senarios. This study underscores the need for
inclusive urban mobility design and highlights potential avenues for
tailoring AV communication to accommodate neurodiversity.}
}













