CGG Smart Data Solutions, like many other service providers in the E&P sector, has seen a transformation of data management from a support function to a value generator for our clients. Companies are expecting service providers to deliver solutions to enable their E&P digitalization strategies, as they move towards adopting emerging digital technologies, such as machine learning and public cloud services. While these new frontiers are exciting and ultimately game-changing, ensuring all factors are considered prior to choosing a solution that is right for your business is extremely important. This is why I encourage people to “ASK before you ACT”. Jumping head first into a new business endeavor before defining the desired outcome, and then asking how we go from here to there, will typically not garner the desired results.
One data type we have been focusing on, particularly for cloud computing at CGG, is seismic data. Seismic data is an interesting data type to talk about when it comes to the emerging cloud storage or processing capabilities and technologies. With the breadth of data associated with the seismic data life cycle and the inherent file size challenges related to acquired (field) data, processed and derived work products, picking one solution that meets all of your business needs and requirements can be a daunting task. Multiple questions can start to pop up once a company decides to explore the possibility of migrating (moving) their data to the cloud, with the first typically being why should I consider using the cloud? While some may have a different answer, the majority of people I have spoken with generally have the same response to this question. The cloud provides the opportunity to promote a greater level of accessibility and use for the various data assets and applications we utilize. While data storage and preservation of the data is also a factor and benefit of using this environment, the real benefit is derived from data being actively used to increase productivity and improve efficiencies.
A few other questions we will explore are:
- Should all of my seismic data be loaded, and how will it be stored?
- Are there any concerns or issues with loading seismic data?
- How long will it take to load or retrieve the various data sizes, and who has access?
- Can seismic data integrate with other software in the cloud, and if so, how does that work?
Should all of my seismic data be loaded, and how will it be stored? I have encountered a mixed reaction within the E&P industry on how to answer this. This is mainly due to the ever-increasing volume of data being generated when new seismic is shot, and the ever changing technical and commercial options made available by the leading cloud service providers. 3D offshore data for example, in some cases, is starting to generate some data volumes of half a petabyte or more. Plugging this data into the cloud sounds easy, but is it really that simple? The cost of having that much data stored on the cloud needs to be analyzed versus the alternative options, such as traditional non-cloud archive options. The decision on how to store each data type and which cloud vendor you will select needs to be determined. With so many cloud solutions in the market, choosing the right cloud architecture for your business will help you achieve your desired results. Sorting through and understanding the various architectures and storage options that each provider offers is a necessary process that will take time. For some companies, having all of their data in a more active storage solution might be feasible and make sense given their specific needs and requirements of constantly having to access those assets. This instance will result in a higher overall data storage cost, but will reduce the cost of retrieving the data from a deeper less active storage option. Other companies may only require data be accessed from the cloud once or twice a year, so utilizing a deeper cold storage directory might be a better fit. This will ensure data storage costs are kept much lower, but will increase data retrieval costs substantially. For some, maybe a mix of all options works best with some data being stored in active directories and some being stored deeper. These will affect the overall cost of storing and managing your digital data assets, so understanding what is the best choice for each data type becomes an important question to answer.
Are there any concerns or issues with loading seismic data? As with any emerging technology, there are going to be potential issues that arise as the company and vendors navigate their way together down this new path. A good example that needs to be investigated and resolved before the loading process begins concerns the application of a naming standard for the various seismic data types. A data dictionary and data standard should always be established and followed within the company to ensure continuity with datasets. Many businesses choose to utilize a system of record to establish and hold the data dictionary in state and allow ongoing organizational structure. If data is loaded “as is” without any checks and balances, then locating and retrieving the correct data in the future becomes a much more difficult task. This will also greatly affect the confidence in the data that is received. In a business where making informed decisions based on the correct data in a timely fashion is so vital, taking the time at the beginning of a project to establish these data rules and data procedures is necessary.
How long will it take to load or retrieve the various data sizes? This is important for business lines and data managers to be aware of and track. Timelines are important and in some cases become very critical. Retrieving information from a large offshore 3D, for instance, is going to take considerably more time to download due to the size of the files than some 2D land data. Being aware of these time differences is crucial when project deadlines are approaching so teams are aware of how much time is needed for data transfers within the cloud. The variation in the download/access time between the data center location to the local office is a factor that also needs to be considered when establishing turnaround times. The IT group needs to determine, as closely as they can, the quantity of bandwidth their organization will require for transfer operations in the cloud. They will have to consider how many users and how many systems will be pulling and pushing data into the network space, along with other processes during peak business hours, and non-peak business hours. High internet speed provides the ability to download/upload more data in a timely fashion. The IT team should be aware of your business needs and ensure sufficient speed is in place for your end of the data pipeline. Some cloud providers offer services that enable physical shipping of data to and from the cloud to assist transfer speeds (Azure – Data Box, AWS – Snowball). At least until network speeds increase in relation to seismic data volumes, these may be the quickest and most efficient ways to move large volumes. Even if the cloud provider offers the fastest connection on the planet, if your business has a low speed connection to the pipeline, the upload and download of data will be very slow.
Who has authorized access for each data type also needs to be determined and agreed upon. IT teams will work with the businesses to delineate regional, individual or business line privilege and access. Especially with global companies, ensuring the data is stored and backed up in the proper region is a top priority. This applies to not only authorized access and download speeds, but this can also tie in with country laws regarding digital data storage and security. This is a great example of why asking these types of questions is so important. Not only do they help save time and money, but they can also help alleviate potential legal complications.
Can seismic data integrate with other software in the cloud, and if so, how does that work? The answer is yes, it can. However, the availability of different vendor applications and the full workflow in the cloud could best be described as embryonic at this stage. As an example, CGG’s latest releases of their Jason, HampsonRussell and PowerLog software all now run seamlessly on Microsoft Azure. Defining a common architecture for how seismic data should be stored in the cloud and then accessed by the multitude of different applications is a key challenge for the industry. Without some form of standardization, there is a danger that although most applications will run in the cloud, they will each require different cloud architectures and storage formats for the seismic data. The Open Subsurface Data Universe is an industry collaboration that recognizes this challenge. One of its aims is to ensure the effective integration of data storage and software applications.
Emerging technologies such as cloud storage services is indeed an exciting endeavor, and whether you like it or not, it is here and seems eager to stay. Many have already completed the initial journey into the cloud, some are currently in the middle of the process and others are just starting. I just want to encourage all people to ensure they ask questions, and really investigate the needs of your business before acting. CGG, like other companies wants the best for clients and itself, so sharing ideas, successes and failures with one another will only help to improve the end goals. Hang it on the bulletin board at work and make it a new company catch phrase, “ASK before you ACT”.
Join the Conversation
Interested in starting, or contributing to a conversation about an article or issue of the RECORDER? Join our CSEG LinkedIn Group.
Share This Article