Databricks Certified Associate Developer for Apache Spark 3.0 認定 Associate-Developer-Apache-Spark 試験問題:
1. Which of the following code blocks returns a single row from DataFrame transactionsDf?
Full DataFrame transactionsDf:
1.+-------------+---------+-----+-------+---------+----+
2.|transactionId|predError|value|storeId|productId| f|
3.+-------------+---------+-----+-------+---------+----+
4.| 1| 3| 4| 25| 1|null|
5.| 2| 6| 7| 2| 2|null|
6.| 3| 3| null| 25| 3|null|
7.| 4| null| null| 3| 2|null|
8.| 5| null| null| null| 2|null|
9.| 6| 3| 2| 25| 2|null|
10.+-------------+---------+-----+-------+---------+----+
A) transactionsDf.where(col("value").isNull()).select("productId", "storeId").distinct()
B) transactionsDf.filter(col("storeId")==25).select("predError","storeId").distinct()
C) transactionsDf.select("productId", "storeId").where("storeId == 2 OR storeId != 25")
D) transactionsDf.where(col("storeId").between(3,25))
E) transactionsDf.filter((col("storeId")!=25) | (col("productId")==2))
2. The code block displayed below contains an error. The code block should return all rows of DataFrame transactionsDf, but including only columns storeId and predError. Find the error.
Code block:
spark.collect(transactionsDf.select("storeId", "predError"))
A) Instead of collect, collectAsRows needs to be called.
B) Columns storeId and predError need to be represented as a Python list, so they need to be wrapped in brackets ([]).
C) The collect method is not a method of the SparkSession object.
D) Instead of select, DataFrame transactionsDf needs to be filtered using the filter operator.
E) The take method should be used instead of the collect method.
3. Which of the following code blocks returns all unique values of column storeId in DataFrame transactionsDf?
A) transactionsDf.distinct("storeId")
B) transactionsDf.filter("storeId").distinct()
C) transactionsDf.select(col("storeId").distinct())
D) transactionsDf.select("storeId").distinct()
(Correct)
E) transactionsDf["storeId"].distinct()
4. Which of the following is one of the big performance advantages that Spark has over Hadoop?
A) Spark achieves great performance by storing data in the DAG format, whereas Hadoop can only use parquet files.
B) Spark achieves performance gains for developers by extending Hadoop's DataFrames with a user-friendly API.
C) Spark achieves higher resiliency for queries since, different from Hadoop, it can be deployed on Kubernetes.
D) Spark achieves great performance by storing data in the HDFS format, whereas Hadoop can only use parquet files.
E) Spark achieves great performance by storing data and performing computation in memory, whereas large jobs in Hadoop require a large amount of relatively slow disk I/O operations.
5. Which of the following code blocks reads in the two-partition parquet file stored at filePath, making sure all columns are included exactly once even though each partition has a different schema?
Schema of first partition:
1.root
2. |-- transactionId: integer (nullable = true)
3. |-- predError: integer (nullable = true)
4. |-- value: integer (nullable = true)
5. |-- storeId: integer (nullable = true)
6. |-- productId: integer (nullable = true)
7. |-- f: integer (nullable = true)
Schema of second partition:
1.root
2. |-- transactionId: integer (nullable = true)
3. |-- predError: integer (nullable = true)
4. |-- value: integer (nullable = true)
5. |-- storeId: integer (nullable = true)
6. |-- rollId: integer (nullable = true)
7. |-- f: integer (nullable = true)
8. |-- tax_id: integer (nullable = false)
A) spark.read.parquet(filePath, mergeSchema='y')
B) spark.read.option("mergeSchema", "true").parquet(filePath)
C) 1.nx = 0
2.for file in dbutils.fs.ls(filePath):
3. if not file.name.endswith(".parquet"):
4. continue
5. df_temp = spark.read.parquet(file.path)
6. if nx == 0:
7. df = df_temp
8. else:
9. df = df.union(df_temp)
10. nx = nx+1
11.df
D) spark.read.parquet(filePath)
E) 1.nx = 0
2.for file in dbutils.fs.ls(filePath):
3. if not file.name.endswith(".parquet"):
4. continue
5. df_temp = spark.read.parquet(file.path)
6. if nx == 0:
7. df = df_temp
8. else:
9. df = df.join(df_temp, how="outer")
10. nx = nx+1
11.df
質問と回答:
| 質問 # 1 正解: B | 質問 # 2 正解: C | 質問 # 3 正解: D | 質問 # 4 正解: E | 質問 # 5 正解: B |














768 お客様のコメント
品質保証JPexamはIT認定試験のシラバスに従って、試験問題の範囲を正確に絞って、的中率が99%の最新問題集を捧げます。
1年間の無料更新サービスJPexamは1年以内に問題集の無料更新サービスを提供し、お客様がいつでも最新版の問題集を持つことを保証いたします。もし試験の内容が変更されたら、弊社は直ちにお客様にお知らせします。それに、弊社の問題集が更新されたら、早速メールで最新バージョンを送付いたします。
全額返金JPexamの問題集を利用すると、短時間で勉強しても試験に合格できるのを保証いたします。試験に不合格になってしまった場合、弊社は全額返金いたします。(
ご購入前のお試しJPexamは問題集のサンプルを無料で提供いたします。ご購入前にサンプルを試用して製品の品質を確認することができます。ご遠慮なく利用してください。
