Notice
Recent Posts
Recent Comments
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
Tags
- 그래프 에코시스템
- r
- 인공지능
- Graph Tech
- graph database
- graph
- GraphX
- Python
- 그래프
- GSQL
- SQL
- Federated Learning
- RDD
- TigerGraph
- RStudio
- 그래프 데이터베이스
- 연합학습
- BigData
- TensorFlow
- spark
- Cypher
- Graph Ecosystem
- 그래프 질의언어
- Neo4j
- GDB
- DeepLearning
- SparkML
- 빅데이터
- 분산 병렬 처리
- 딥러닝
Archives
- Today
- Total
Hee'World
Spark DataFrame 02 (Pyspark) 본문
Spark DataFrame¶
- select¶
In [3]:
df = spark.read.json("data/2015-summary.json")
In [6]:
df.printSchema()
In [8]:
df.select("DEST_COUNTRY_NAME").show()
In [9]:
df.select(["DEST_COUNTRY_NAME","count"]).show()
withColumn¶
In [12]:
df.withColumn("newCount", df["count"]+2).show()
In [15]:
df.withColumnRenamed("count", "renameCount").show()
groupby¶
In [18]:
df.groupBy("DEST_COUNTRY_NAME").count().show()
filter¶
In [19]:
df = spark.read.csv("data/appl_stock.csv", inferSchema=True, header=True)
In [20]:
df.printSchema()
In [22]:
df.filter("Close < 500").show()
In [23]:
df.filter("Close < 500").select("Open").show()
In [24]:
df.filter("Close < 500").select(["Open", "Close"]).show()
In [25]:
df.filter(df["Close"] < 200).show()
In [28]:
df.filter((df["Close"] < 200) & (df['Open'] > 200)).show()
In [29]:
df.filter((df["Close"] < 200) | (df['Open'] > 200)).show()
In [30]:
df.filter((df["Close"] < 200) & ~(df['Open'] > 200)).show()
In [32]:
df.filter(df["Low"] == 197.16).show()
In [ ]:
In [ ]:
'BigData > Spark' 카테고리의 다른 글
Spark SQL (PySpark) (0) | 2020.04.15 |
---|---|
Spark DataFrame 03 (Pyspark) (0) | 2020.04.11 |
Spark DataFrame01 (Pyspark) (0) | 2020.04.11 |
Spark RDD 문법 (0) | 2020.04.06 |
Spark RDD (0) | 2020.04.04 |
Comments