[2022] Pass Professional-Machine-Learning-Engineer Exam - Real Questions and Answers
Professional-Machine-Learning-Engineer Exam Questions Get Updated [2022] with Correct Answers
Google Professional-Machine-Learning-Engineer Exam Syllabus Topics:
| Topic | Details |
|---|---|
| Topic 1 |
|
| Topic 2 |
|
| Topic 3 |
|
| Topic 4 |
|
| Topic 5 |
|
| Topic 6 |
|
NEW QUESTION 41
You work for a large hotel chain and have been asked to assist the marketing team in gathering predictions for a targeted marketing strategy. You need to make predictions about user lifetime value (LTV) over the next 30 days so that marketing can be adjusted accordingly. The customer dataset is in BigQuery, and you are preparing the tabular data for training with AutoML Tables. This data has a time signal that is spread across multiple columns. How should you ensure that AutoML fits the best model to your data?
- A. Submit the data for training without performing any manual transformations, and indicate an appropriate column as the Time column Allow AutoML to split your data based on the time signal provided, and reserve the more recent data for the validation and testing sets
- B. Submit the data for training without performing any manual transformations Allow AutoML to handle the appropriate transformations Choose an automatic data split across the training, validation, and testing sets
- C. Manually combine all columns that contain a time signal into an array Allow AutoML to interpret this array appropriately Choose an automatic data split across the training, validation, and testing sets
- D. Submit the data for training without performing any manual transformations Use the columns that have a time signal to manually split your data Ensure that the data in your validation set is from 30 days after the data in your training set and that the data in your testing set is from 30 days after your validation set
Answer: D
NEW QUESTION 42
A web-based company wants to improve its conversion rate on its landing page. Using a large historical dataset of customer visits, the company has repeatedly trained a multi-class deep learning network algorithm on Amazon SageMaker. However, there is an overfitting problem: training data shows 90% accuracy in predictions, while test data shows 70% accuracy only.
The company needs to boost the generalization of its model before deploying it into production to maximize conversions of visits to purchases.
Which action is recommended to provide the HIGHEST accuracy model for the company's test and validation data?
- A. Apply L1 or L2 regularization and dropouts to the training
- B. Increase the randomization of training data in the mini-batches used in training
- C. Reduce the number of layers and units (or neurons) from the deep learning network
- D. Allocate a higher proportion of the overall data to the training dataset
Answer: C
NEW QUESTION 43
You are responsible for building a unified analytics environment across a variety of on-premises data marts. Your company is experiencing data quality and security challenges when integrating data across the servers, caused by the use of a wide range of disconnected tools and temporary solutions. You need a fully managed, cloud-native data integration service that will lower the total cost of work and reduce repetitive work. Some members on your team prefer a codeless interface for building Extract, Transform, Load (ETL) process. Which service should you use?
- A. Dataprep
- B. Cloud Data Fusion
- C. Dataflow
- D. Apache Flink
Answer: B
NEW QUESTION 44
Your team needs to build a model that predicts whether images contain a driver's license, passport, or credit card. The data engineering team already built the pipeline and generated a dataset composed of 10,000 images with driver's licenses, 1,000 images with passports, and 1,000 images with credit cards. You now have to train a model with the following label map: ['driversjicense', 'passport', 'credit_card']. Which loss function should you use?
- A. Categorical cross-entropy
- B. Sparse categorical cross-entropy
- C. Categorical hinge
- D. Binary cross-entropy
Answer: D
NEW QUESTION 45
You are training an LSTM-based model on Al Platform to summarize text using the following job submission script:
You want to ensure that training time is minimized without significantly compromising the accuracy of your model. What should you do?
- A. Modify the 'scale-tier' parameter
- B. Modify the 'learning rate' parameter
- C. Modify the 'epochs' parameter
- D. Modify the batch size' parameter
Answer: C
NEW QUESTION 46
You have trained a deep neural network model on Google Cloud. The model has low loss on the training data, but is performing worse on the validation dat a. You want the model to be resilient to overfitting. Which strategy should you use when retraining the model?
- A. Apply a 12 regularization parameter of 0.4, and decrease the learning rate by a factor of 10.
- B. Run a hyperparameter tuning job on Al Platform to optimize for the learning rate, and increase the number of neurons by a factor of 2.
- C. Run a hyperparameter tuning job on Al Platform to optimize for the L2 regularization and dropout parameters
- D. Apply a dropout parameter of 0 2, and decrease the learning rate by a factor of 10
Answer: C
Explanation:
https://machinelearningmastery.com/introduction-to-regularization-to-reduce-overfitting-and-improve-generalization-error/
NEW QUESTION 47
A Data Science team within a large company uses Amazon SageMaker notebooks to access data stored in Amazon S3 buckets. The IT Security team is concerned that internet-enabled notebook instances create a security vulnerability where malicious code running on the instances could compromise data privacy. The company mandates that all instances stay within a secured VPC with no internet access, and data communication traffic must stay within the AWS network.
How should the Data Science team configure the notebook instance placement to meet these requirements?
- A. Associate the Amazon SageMaker notebook with a private subnet in a VPC. Place the Amazon SageMaker endpoint and S3 buckets within the same VPC.
- B. Associate the Amazon SageMaker notebook with a private subnet in a VPC. Use IAM policies to grant access to Amazon S3 and Amazon SageMaker.
- C. Associate the Amazon SageMaker notebook with a private subnet in a VPC. Ensure the VPC has a NAT gateway and an associated security group allowing only outbound connections to Amazon S3 and Amazon SageMaker.
- D. Associate the Amazon SageMaker notebook with a private subnet in a VPC. Ensure the VPC has S3 VPC endpoints and Amazon SageMaker VPC endpoints attached to it.
Answer: D
NEW QUESTION 48
You started working on a classification problem with time series data and achieved an area under the receiver operating characteristic curve (AUC ROC) value of 99% for training data after just a few experiments. You haven't explored using any sophisticated algorithms or spent any time on hyperparameter tuning. What should your next step be to identify and fix the problem?
- A. Address the model overfitting by using a less complex algorithm.
- B. Address the model overfitting by tuning the hyperparameters to reduce the AUC ROC value.
- C. Address data leakage by removing features highly correlated with the target value.
- D. Address data leakage by applying nested cross-validation during model training.
Answer: D
Explanation:
https://towardsdatascience.com/time-series-nested-cross-validation-76adba623eb9
NEW QUESTION 49
You work with a data engineering team that has developed a pipeline to clean your dataset and save it in a Cloud Storage bucket. You have created an ML model and want to use the data to refresh your model as soon as new data is available. As part of your CI/CD workflow, you want to automatically run a Kubeflow Pipelines training job on Google Kubernetes Engine (GKE). How should you architect this workflow?
- A. Configure your pipeline with Dataflow, which saves the files in Cloud Storage After the file is saved, start the training job on a GKE cluster
- B. Configure a Cloud Storage trigger to send a message to a Pub/Sub topic when a new file is available in a storage bucket. Use a Pub/Sub-triggered Cloud Function to start the training job on a GKE cluster
- C. Use Cloud Scheduler to schedule jobs at a regular interval. For the first step of the job. check the timestamp of objects in your Cloud Storage bucket If there are no new files since the last run, abort the job.
- D. Use App Engine to create a lightweight python client that continuously polls Cloud Storage for new files As soon as a file arrives, initiate the training job
Answer: A
NEW QUESTION 50
Machine Learning Specialist is training a model to identify the make and model of vehicles in images. The Specialist wants to use transfer learning and an existing model trained on images of general objects. The Specialist collated a large custom dataset of pictures containing different vehicle makes and models.
What should the Specialist do to initialize the model to re-train it with the custom data?
- A. Initialize the model with pre-trained weights in all layers and replace the last fully connected layer.
- B. Initialize the model with pre-trained weights in all layers including the last fully connected layer.
- C. Initialize the model with random weights in all layers and replace the last fully connected layer.
- D. Initialize the model with random weights in all layers including the last fully connected layer.
Answer: A
Explanation:
Explanation/Reference:
NEW QUESTION 51
You are an ML engineer at a global shoe store. You manage the ML models for the company's website. You are asked to build a model that will recommend new products to the user based on their purchase behavior and similarity with other users. What should you do?
- A. Build a regression model using the features as predictors
- B. Build a classification model
- C. Build a knowledge-based filtering model
- D. Build a collaborative-based filtering model
Answer: D
NEW QUESTION 52
A Data Scientist is training a multilayer perception (MLP) on a dataset with multiple classes. The target class of interest is unique compared to the other classes within the dataset, but it does not achieve and acceptable recall metric. The Data Scientist has already tried varying the number and size of the MLP's hidden layers, which has not significantly improved the results. A solution to improve recall must be implemented as quickly as possible.
Which techniques should be used to meet these requirements?
- A. Train an anomaly detection model instead of an MLP
- B. Gather more data using Amazon Mechanical Turk and then retrain
- C. Train an XGBoost model instead of an MLP
- D. Add class weights to the MLP's loss function and then retrain
Answer: C
NEW QUESTION 53
You built and manage a production system that is responsible for predicting sales numbers. Model accuracy is crucial, because the production model is required to keep up with market changes. Since being deployed to production, the model hasn't changed; however the accuracy of the model has steadily deteriorated. What issue is most likely causing the steady decline in model accuracy?
- A. Poor data quality
- B. Incorrect data split ratio during model training, evaluation, validation, and test
- C. Too few layers in the model for capturing information
- D. Lack of model retraining
Answer: D
Explanation:
Retraining is needed as the market is changing. its how the Model keep updated and predictions accuracy.
NEW QUESTION 54
Your data science team needs to rapidly experiment with various features, model architectures, and hyperparameters. They need to track the accuracy metrics for various experiments and use an API to query the metrics over time. What should they use to track and report their experiments while minimizing manual effort?
- A. Use Al Platform Training to execute the experiments Write the accuracy metrics to Cloud Monitoring, and query the results using the Monitoring API.
- B. Use Kubeflow Pipelines to execute the experiments Export the metrics file, and query the results using the Kubeflow Pipelines API.
- C. Use Al Platform Training to execute the experiments Write the accuracy metrics to BigQuery, and query the results using the BigQueryAPI.
- D. Use Al Platform Notebooks to execute the experiments. Collect the results in a shared Google Sheets file, and query the results using the Google Sheets API
Answer: C
NEW QUESTION 55
You recently joined a machine learning team that will soon release a new project. As a lead on the project, you are asked to determine the production readiness of the ML components. The team has already tested features and data, model development, and infrastructure. Which additional readiness check should you recommend to the team?
- A. Ensure that training is reproducible
- B. Ensure that feature expectations are captured in the schema
- C. Ensure that model performance is monitored
- D. Ensure that all hyperparameters are tuned
Answer: A
Explanation:
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/aad9f93b86b7addfea4c419b9100c6cdd26cacea.pdf
NEW QUESTION 56
A data scientist needs to identify fraudulent user accounts for a company's ecommerce platform. The company wants the ability to determine if a newly created account is associated with a previously known fraudulent user.
The data scientist is using AWS Glue to cleanse the company's application logs during ingestion.
Which strategy will allow the data scientist to identify fraudulent accounts?
- A. Create an AWS Glue crawler to infer duplicate accounts in the source data.
- B. Create a FindMatches machine learning transform in AWS Glue.
- C. Search for duplicate accounts in the AWS Glue Data Catalog.
- D. Execute the built-in FindDuplicates Amazon Athena query.
Answer: B
Explanation:
Explanation/Reference: https://docs.aws.amazon.com/glue/latest/dg/machine-learning.html
NEW QUESTION 57
A Machine Learning Specialist wants to determine the appropriate
SageMakerVariantInvocationsPerInstancesetting for an endpoint automatic scaling configuration.
The Specialist has performed a load test on a single instance and determined that peak requests per second (RPS) without service degradation is about 20 RPS. As this is the first deployment, the Specialist intends to set the invocation safety factor to 0.5.
Based on the stated parameters and given that the invocations per instance setting is measured on a per- minute basis, what should the Specialist set as the SageMakerVariantInvocationsPerInstance setting?
- A. 2,400
- B. 0
- C. 1
- D. 2
Answer: B
NEW QUESTION 58
You are going to train a DNN regression model with Keras APIs using this code:
How many trainable weights does your model have? (The arithmetic below is correct.)
- A. 501*256+257*128+128*2=161408
- B. 500*256+256*128+128*2 = 161024
- C. 501*256+257*128+2 = 161154
- D. 500*256*0 25+256*128*0 25+128*2 = 40448
Answer: D
NEW QUESTION 59
A Machine Learning Specialist is developing a daily ETL workflow containing multiple ETL jobs. The workflow consists of the following processes:
* Start the workflow as soon as data is uploaded to Amazon S3.
* When all the datasets are available in Amazon S3, start an ETL job to join the uploaded datasets with multiple terabyte-sized datasets already stored in Amazon S3.
* Store the results of joining datasets in Amazon S3.
* If one of the jobs fails, send a notification to the Administrator.
Which configuration will meet these requirements?
- A. Develop the ETL workflow using AWS Batch to trigger the start of ETL jobs when data is uploaded to Amazon S3. Use AWS Glue to join the datasets in Amazon S3. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure.
- B. Use AWS Lambda to trigger an AWS Step Functions workflow to wait for dataset uploads to complete in Amazon S3. Use AWS Glue to join the datasets. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure.
- C. Use AWS Lambda to chain other Lambda functions to read and join the datasets in Amazon S3 as soon as the data is uploaded to Amazon S3. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure.
- D. Develop the ETL workflow using AWS Lambda to start an Amazon SageMaker notebook instance. Use a lifecycle configuration script to join the datasets and persist the results in Amazon S3. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure.
Answer: B
Explanation:
Explanation/Reference: https://aws.amazon.com/step-functions/use-cases/
NEW QUESTION 60
You have trained a model on a dataset that required computationally expensive preprocessing operations. You need to execute the same preprocessing at prediction time. You deployed the model on Al Platform for high-throughput online prediction. Which architecture should you use?
- A. * Send incoming prediction requests to a Pub/Sub topic
* Transform the incoming data using a Dataflow job
* Submit a prediction request to Al Platform using the transformed data
* Write the predictions to an outbound Pub/Sub queue - B. * Send incoming prediction requests to a Pub/Sub topic
* Set up a Cloud Function that is triggered when messages are published to the Pub/Sub topic.
* Implement your preprocessing logic in the Cloud Function
* Submit a prediction request to Al Platform using the transformed data
* Write the predictions to an outbound Pub/Sub queue - C. * Validate the accuracy of the model that you trained on preprocessed data
* Create a new model that uses the raw data and is available in real time
* Deploy the new model onto Al Platform for online prediction - D. * Stream incoming prediction request data into Cloud Spanner
* Create a view to abstract your preprocessing logic.
* Query the view every second for new records
* Submit a prediction request to Al Platform using the transformed data
* Write the predictions to an outbound Pub/Sub queue.
Answer: A
Explanation:
https://cloud.google.com/architecture/data-preprocessing-for-ml-with-tf-transform-pt1#where_to_do_preprocessing
NEW QUESTION 61
You are developing a Kubeflow pipeline on Google Kubernetes Engine. The first step in the pipeline is to issue a query against BigQuery. You plan to use the results of that query as the input to the next step in your pipeline. You want to achieve this in the easiest way possible. What should you do?
- A. Locate the Kubeflow Pipelines repository on GitHub Find the BigQuery Query Component, copy that component's URL, and use it to load the component into your pipeline. Use the component to execute queries against BigQuery
- B. Use the BigQuery console to execute your query and then save the query results Into a new BigQuery table.
- C. Use the Kubeflow Pipelines domain-specific language to create a custom component that uses the Python BigQuery client library to execute queries
- D. Write a Python script that uses the BigQuery API to execute queries against BigQuery Execute this script as the first step in your Kubeflow pipeline
Answer: A
Explanation:
https://linuxtut.com/en/f4771efee37658c083cc/
https://github.com/kubeflow/pipelines/blob/master/components/gcp/bigquery/query/sample.ipynb
; https://v0-5.kubeflow.org/docs/pipelines/reusable-components/
NEW QUESTION 62
You built and manage a production system that is responsible for predicting sales numbers. Model accuracy is crucial, because the production model is required to keep up with market changes. Since being deployed to production, the model hasn't changed; however the accuracy of the model has steadily deteriorated. What issue is most likely causing the steady decline in model accuracy?
- A. Lack of model retraining
- B. Poor data quality
- C. Incorrect data split ratio during model training, evaluation, validation, and test
- D. Too few layers in the model for capturing information
Answer: C
NEW QUESTION 63
A Data Scientist needs to analyze employment data. The dataset contains approximately 10 million observations on people across 10 different features. During the preliminary analysis, the Data Scientist notices that income and age distributions are not normal. While income levels shows a right skew as expected, with fewer individuals having a higher income, the age distribution also show a right skew, with fewer older individuals participating in the workforce.
Which feature transformations can the Data Scientist apply to fix the incorrectly skewed data? (Choose two.)
- A. High-degree polynomial transformation
- B. Numerical value binning
- C. One hot encoding
- D. Cross-validation
- E. Logarithmic transformation
Answer: B,D
NEW QUESTION 64
An interactive online dictionary wants to add a widget that displays words used in similar contexts. A Machine Learning Specialist is asked to provide word features for the downstream nearest neighbor model powering the widget.
What should the Specialist do to meet these requirements?
- A. Download word embeddings pre-trained on a large corpus.
- B. Create one-hot word encoding vectors.
- C. Create word embedding vectors that store edit distance with every other word.
- D. Produce a set of synonyms for every word using Amazon Mechanical Turk.
Answer: B
Explanation:
Explanation/Reference: https://aws.amazon.com/blogs/machine-learning/amazon-sagemaker-object2vec-adds-new- features-that-support-automatic-negative-sampling-and-speed-up-training/
NEW QUESTION 65
You are going to train a DNN regression model with Keras APIs using this code:
How many trainable weights does your model have? (The arithmetic below is correct.)
- A. 501*256+257*128+128*2=161408
- B. 500*256*0 25+256*128*0 25+128*2 = 40448
- C. 500*256+256*128+128*2 = 161024
- D. 501*256+257*128+2 = 161154
Answer: A
NEW QUESTION 66
......
Difficulty in Writing Professional Machine Learning Engineer - Google
This exam may go hard for you if you had not done its preparation properly. There are many websites that are offering the latest Google Machine Learning Professional questions and answers but these questions are not verified by Google certified experts and that's why many are failed in their just first attempt. Exam4Tests is the best platform which provides the candidate with the necessary Google Machine Learning Professional exam questions that will help him to pass the Google Machine Learning Professional on the first time. Candidate will not have to take the Google Machine Learning Professional twice because with the help of Google Professional-Machine-Learning-Engineer exam dumps candidate will have every valuable material required to pass the Google Machine Learning Professional. We are providing the latest and actual questions and that is the reason why this is the one that he needs to use and there are no chances to fail when a candidate will have valid exam dumps from Exam4Tests. We have the guarantee that the questions that we have will be the ones that will pass candidate in the Google Machine Learning Professional in the very first attempt.
The aim is to keep candidates up-to-date and we shall automatically amend the material when and when the Offensive Protection reports any changes in the Google Professional-Machine-Learning-Engineer exam dumps.
For more info read reference:
Practice Professional-Machine-Learning-Engineer Questions With Certification guide Q&A from Training Expert Exam4Tests: https://www.exam4tests.com/Professional-Machine-Learning-Engineer-valid-braindumps.html
Free Google Professional-Machine-Learning-Engineer Test Practice Test Questions Exam Dumps: https://drive.google.com/open?id=1E9qBckLy_3WLi8sFBRCPmKVUNLdVmXf7