Preliminaries

The purpose of this document is to report on the cross-correlation analysis between the HI fluctuations as probed by BMX and the spatial distribution of the galaxies detected by the SDSS survey.

Some of the topics we’ll discuss below include the SDSS template creation, the cleaning of the BMX visibilities, and the (cross-)power spectrum analysis.

<aside> ⚠️ While this document discusses and shows partial results for only a subset of all the available data, the pager that allows for a quick visual exploration of the results for all the considered data IDs can be accessed here.

</aside>

Data selection

The first question concerns which days we deem good enough for running the cross-correlation analysis on.

I have created a pager that displays diagnostic plots to aid in identifying "good days" for observations. These plots show the evolution of temperature for the FPGA, ADC, and frontend, as well as the sun's altitude, as a function of time. I have also attempted to use Pearson's correlation coefficient (calculated between the temperature/sun position over the SDSS patch) to distinguish good from bad observations, however, I do not believe it is very effective in.

As you will see, there is a wide range of temperature trends. The plots have been produced for a large fraction of the ~700 IDs in the /gpfs02/astro/workarea/bmxdata/reduced/pas directory, but some were not possible as the temperature information was not available.

<aside> ⚠️ As a first pass, Anže and I decided to initially look at the night days, defined here as the data IDs for which the sun’s altitude is < 0 for 95% or more of the time when the SDSS patch is over BMX. This can of course be revisited as needed.

</aside>

BMX data processing

Even though we’re relying on a cross-correlation analysis, BMX data are polluted by a number of contaminants and affected by instrumental effects. As such, some processing needs to be done before they can be combined to SDSS data. This section discusses the cleaning steps we apply to the BMX visibilities and how we convert them into fluctuations, $\tilde{V}{11}(t,\nu) \to \delta{11}(t,\nu)$ (here tilde denotes “raw” or “dirty” quantities).

Broadly speaking, based on their temporal structure, we can identify two main classes of spurious signals we want to get rid of:

Slowly varying components: gain variations, should be divided out;
Quick spikes: satellites and RFI, should be subtracted/masked.

The first step consists in downsampling each the observed visibilities by a factor 10x along the time-axis to reduce computational cost. A waterfall plot for a generic observation ID would look something like the figure below, where the satellite emission is clear towards lower frequencies.

Waterfall plot of the 11 autocorrelation for a generic observation. The area delimited by the intersection of the dashed red and white line denotes the “science band” that contains the emission from the SDSS galaxies. Plotted here is the log10 of the 11 visibility (the colorbar goes from the 5th to 95th percentile of the array).

Spike flagging and fill-in

For each time bin we individually: