### Introduction

Traditional noise removal methods based on Fourier analysis, Radon transforms, and prediction filtering have occupied an important place in our arsenal of methods for noise attenuation. They all rely on different assumptions and, in general, on a simple signal model. An open problem in seismic data processing is the establishment of proper basis functions to represent seismic data. In addition, it is important to understand how one can use an optimal data representation to separate signal from random and coherent noise, and to regularize aliased and noisy data. We will review how old meets new and in particular, how one can complement well-known methods for noise attenuation with novel ideas in the field of sparse coding and compressive sensing. In addition, we provide examples of seismic data regularization where some of these ideas lead to algorithms amenable to regularizing data in the presence of strong coherent noise and aliased noise.

### Local basis functions and sparsity

We start our discussion by considering the classical problem of representing a signal in terms of a superposition of basis functions. The representation can be written using the simple language of linear algebra as follows

where d is the Nx1 vector of spatial data and a denotes the Mx1 vector containing the coefficients of the expansion. The columns of the matrix G are the basis functions of the expansion. In general, this model is very flexible if we consider an overcomplete expansion. The latter, in simple terms, means that M>N. For over-complete expansions, we have an underdetermined problem and a priori information is needed to estimate the coefficients of the expansion. In real applications, equation (1) is further complicated by the presence of noise and incomplete sampling. This can be mathematically expressed via,

where n is the noise and T is a sampling operator. The underdetermined problem admits an infinite number of solutions. Among all possible solutions, we often chose one that leads to sparse representations (Olshausen and Field, 1996; Sacchi et. al, 1998; Hennenfent and Herrmann, 2006). The latter is a signal expansion where only a few columns of G are used to fit the observed data. We also stress that sparsity is the core of new paradigms in signal processing and harmonic analysis that are studied under the field of compressive sensing (Donoho, 2006).

The classical signal processing model maintains that a spatial signal must be sampled at a rate at least twice it highest wavenumber in order to properly represent the unrecorded continuous wave field. However, sub-Nyquist sampling is possible when the signal admits a sparse representation. This is known as compressive sensing or compressive sampling, a theory that has become quite popular in the signal and image processing communities (Candès a and Wakin, 2008).

The importance of sparsity in data recovery problems can be demonstrated with a very simple example. Let’s consider basis functions given by Fourier complex exponentials that have been multiplied by a Gaussian window and translated in time. In other words, the expansion of the signal is in terms of Gabor atoms (Li and Ulrych, 1996). The coefficients of the expansion are a function of time and frequency or space and wavenumber. In Figures 1 and 2 we compare the reconstruction of a highly non-stationary signal using the classical Gabor expansion and the sparse Gabor representation. Figure 1a portrays a synthetic hyperbolic chirp with erasures. The instantaneous frequency used to generate the signal is portrayed in Figure 1b. The classical Gabor transform representation is provided in Figure 1c. In this case, one can show that the coefficients of the representation have minimum norm (in an L2 sense). The reconstruction of the data is portrayed in Figure 1e. It is clear from these figures that the minimum norm solution to equation (2) has lead to coefficients that cannot recover the missing data. On the other hand, Figure 2 illustrates the case were sparsity was imposed upon the Gabor coefficients. Not only does the estimated Gabor t-f plane in Figure 2c shows fewer artefacts but, in addition, the Gabor synthesis was capable of recovering the missing samples. Figure 1. a) A hyperbolic chirp with erasures. b) Instantaneous frequency of the hyperbolic chirp. c) Time-frequency plane estimated via the Gabor transform. d) Gabor spectrum at 0.3s. d) Reconstruction of the chirp. It is clear that the estimated Gabor representation fails to reconstruct the data. Figure 2. a) A hyperbolic chirp with erasures. b) Instantaneous frequency of the hyperbolic chirp. c) Time-frequency plane estimated via the sparse Gabor representation. d) Gabor spectrum at 0.3s. d) Reconstruction of the chirp. In this example the sparsity constraint was utilized to compress the signal in the t-f plane.

The aforedescribed algorithm can be modified to regularize 2D data with strong variations of dip (Sacchi et al., 2009). For this purpose, the reconstruction is conceived in the f-x domain where the 1D Gabor reconstruction algorithm is applied at each individual temporal frequency f. The results are shown in Figure 3. Figure 3. a) Common offset section. b) Decimated common offset section. c) Reconstruction via the sparse f-x Gabor representation. d) Reconstruction error.

We now consider an example where the basis functions are curvelets. The curvelet transform is a local and directional decomposition of an image (data) into harmonic scales (Candes and Donoho, 2004; Hennenfent and Herrmann, 2006). The curvelet transform aims to find the contribution from each point of data in the t-x to isolated directional windows in the f-k domain. Figures 4a and 4c portray a shot gather before and after reconstruction using the curvelet expansion. It is well known that low frequencies are less likely to be contaminated by alias. This is clearly seen in Figure 5a where we have portrayed the curvelet coefficients in a diagram that shows the individual curvelet t-x patches organized according to scale (frequency) and directionality (angle or dip). Figure 5b shows a mask function derived by applying a threshold operator to the low scales. The alias-contaminated scales were annihilated using a mask function extracted from the low alias free scale (Naghizadeh and Sacchi, 2009). The mask function behaves like a local all-pass operator that eliminates the coefficients contaminated by alias. The surviving coefficients are fit to the observed data via the least-squares method. Finally, the estimated coefficients are used to synthesize the un-aliased data showed in Figure 4c. It is clear that this approach resembles the classical f-x (Spitz, 1991) interpolation method that relies on low frequency information to reconstruct aliased data. This is good example, we believe, of “old” ideas informing new development in signal processing. Figure 4. a) Shot gather. b) Spectrum of the shot gather. c) Beyond alias interpolation of the shot gather via an expansion in terms of curvelets. d) Spectrum of the interpolated gather. Figure 5. a) Shot gather presentation in the curvelet domain. b) Mask function derived from the alias-free coarse scales. c) Curvelet coefficients found via least-squares inversion. The inverted coefficients were used to synthesize the data in Figure 4c.

In our last example, we discuss the problem of finding an expansion of data by sparse coding. In this case, the basis functions are unknowns and are derived from the data. The problem is reminiscent of blind deconvolution where the reflectivity and the wavelet are simultaneously recovered from an observed seismic trace. Figure 6 portrays a synthetic example were the basis functions were extracted from the data via the algorithm described in Kaplan et al. (2009). The extracted basis functions were used to fit the data and the resulting de-noised image is provided in Figured 6b. The same method was used to isolate basis functions for the attenuation of multiple reflections. This is shown in Figure 7a. As we have already mentioned, the basis functions were extracted directly from the data. In this case, they were sorted according to smoothness and flatness, and those with the least amount of flatness were used to model the noise and the multiples. The model of multiples generated via the expansion with the retrieved basis functions is finally subtracted from the data to recover the primary reflections. Figure 6. (a) The basis patches found from sparse coding. (b) (Left) Data from where the basis patches are found, (centre) filtered data, and (right) the residuals. Figure 7. a) Basis patches found by sparse coding. (b) The CMP gather from where the bases were found. (c) Multiples estimated using sparse coding. (d) Primaries estimated via sparse coding. (e) Multiples estimated using the parabolic Radon transform. (f) Primaries estimated via the parabolic Radon transform.

### Conclusions

The concept of data expansion is fundamental for seismic data processing. We continuously represent our data in other domains and hope that these domains will help us to separate signal from noise, reconstruct data, and to analyze phenomena that are hidden in the original data space. The tools of linear algebra and linear inverse theory lead to interesting algorithms to retrieve sparse data representations that can be used for filtering, analysis and reconstruction.

We have briefly discussed the problem of estimating the basis functions from the data. The latter could lead to the new paradigms in seismic signal processing where basis functions for noise removal and for data reconstruction are estimated by exploiting data redundancy in seismic volumes. Mauricio D. Sacchi was born in Colonel Brandsen (Argentina). He attended the National University of La Plata where he received a Diploma in Geophysics in 1988 and the University of British Columbia, Vancouver, where he received a PhD in geophysics in 1996. He joined the University of Alberta in 1996 where he teaches geophysical signal processing, introduction to seismic imaging, and inverse problems. His research interests are in applied seismology, global seismology and ultrasound medical imaging. He runs, in collaboration with graduate students and post-doctoral fellows, the Signal Analysis and Imaging Group. The latter is an industrial collaborative effort for research in statistical seismic signal analysis, transform-based methods for noise removal, seismic data regularization and imaging.

Sam T. Kaplan received his B.Sc. in Geophysics and Mathematics in 2000, and his Masters in Geophysics in 2003, both from the University of British Columbia. Following his M.Sc., he worked in Vancouver, Canada, implementing computer algorithms for the M-OSRP (University of Houston) research group. Currently, he is completing a Ph.D. in seismic imaging at the Department of Physics, University of Alberta. Sam has spent the last two summers working for ConocoPhillips Houston. His research interests are in the area of seismic signal processing, inverse problems, imaging and high performance computing.

Mostafa Naghizadeh received B.Sc. in Mining Engineering from the University of Kerman, Iran, in 2000 and M.Sc. in Geophysics from the University of Tehran, Iran, in 2003. He received his PhD in Geophysics from the University of Alberta in 2009. His interests are in seismic data reconstruction methods, sampling theory and seismic imaging. He currently holds a postdoctoral fellow position with the Signal Analysis and Imaging Group (SAIG) at the University of Alberta. Mostafa is the recipient of an Honorable Mention for Best Student Paper at the CSEG 2006 convention, CSEG Best Student Paper Award in 2007 and 2009, and best SEG Student Poster Award in 2008.

### References

Candes, E. J. and D. L. Donoho, 2004, New tight frames of curvelets and optimal representations of objects with piecewise-c2 singularities: Comm. on Pure and Appl. Math., 57, 219–266.

Candès E. and Wakin M., 2008, An introduction to compressive sampling: IEEE Signal Processing Magazine, 25(2), pp. 21 - 30.

Donoho, D., 2006, Compressed sensing: IEEE Trans. on Information Theory, 52(4), pp. 1289 – 1306.

Hennenfent, G. and Herrmann, F.J., 2006, Application of stable signal recovery to seismic data interpolation: SEG, Expanded Abstracts, 25 (1), 2797-2801.

Li, Xin-Gong and Ulrych, T.J., 1996, Coherent noise filtering using a 2 D Gabor transform: SEG, Expanded Abstracts, 15 (1), 1180-1183.

Kaplan S. T., Sacchi M. D. and Ulrych T.J., 2009, Sparse-coding for data-driven coherent and incoherent noise attenuation: 79th Annual International Meeting, SEG, Expanded Abstracts. 3327-3331.

Naghizadeh N. and Sacchi M.D., 2009, Beyond alias hierarchical scale curvelet interpolation of regularly and irregularly sampled seismic data: SAIG10-6 Technical Report, 106-131.

Olshausen, B. A. and Field, D. J., 1996, Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607-609

Sacchi M.D., Ulrych T.J. and Walker C., 1998, Interpolation and extrapolation using a high-resolution discrete Fourier transform. IEEE Trans. on Signal Processing, 46(1) pp. 31 - 38.

Sacchi M D., Kaplan S.T., and Naghizadeh M., 2009, fx Gabor Seismic Data Reconstruction, EAGE conference, Amsterdam, Expanded Abstracts.

Spitz, S. ,1991, Seismic trace interpolation in the F-X domain: Geophysics, 56(6), 785–796.