Notice
Recent Posts
Recent Comments
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 |
Tags
- 딥러닝
- r
- Cypher
- Neo4j
- Graph Tech
- TensorFlow
- 그래프
- Python
- GraphX
- 그래프 데이터베이스
- Graph Ecosystem
- graph
- 분산 병렬 처리
- 인공지능
- spark
- BigData
- 그래프 에코시스템
- 연합학습
- GDB
- TigerGraph
- 그래프 질의언어
- SQL
- graph database
- SparkML
- RDD
- DeepLearning
- GSQL
- 빅데이터
- RStudio
- Federated Learning
Archives
- Today
- Total
Hee'World
Spark DataFrame 02 (Pyspark) 본문
Spark DataFrame¶
- select¶
In [3]:
df = spark.read.json("data/2015-summary.json")
In [6]:
df.printSchema()
In [8]:
df.select("DEST_COUNTRY_NAME").show()
In [9]:
df.select(["DEST_COUNTRY_NAME","count"]).show()
withColumn¶
In [12]:
df.withColumn("newCount", df["count"]+2).show()
In [15]:
df.withColumnRenamed("count", "renameCount").show()
groupby¶
In [18]:
df.groupBy("DEST_COUNTRY_NAME").count().show()
filter¶
In [19]:
df = spark.read.csv("data/appl_stock.csv", inferSchema=True, header=True)
In [20]:
df.printSchema()
In [22]:
df.filter("Close < 500").show()
In [23]:
df.filter("Close < 500").select("Open").show()
In [24]:
df.filter("Close < 500").select(["Open", "Close"]).show()
In [25]:
df.filter(df["Close"] < 200).show()
In [28]:
df.filter((df["Close"] < 200) & (df['Open'] > 200)).show()
In [29]:
df.filter((df["Close"] < 200) | (df['Open'] > 200)).show()
In [30]:
df.filter((df["Close"] < 200) & ~(df['Open'] > 200)).show()
In [32]:
df.filter(df["Low"] == 197.16).show()
In [ ]:
In [ ]:
'BigData > Spark' 카테고리의 다른 글
Spark SQL (PySpark) (0) | 2020.04.15 |
---|---|
Spark DataFrame 03 (Pyspark) (0) | 2020.04.11 |
Spark DataFrame01 (Pyspark) (0) | 2020.04.11 |
Spark RDD 문법 (0) | 2020.04.06 |
Spark RDD (0) | 2020.04.04 |
Comments