运行SparkContext报错:
ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=PySparkShell, master=local[*]) created by <module> at /usr/local/spark/python/pyspark/shell.py:59
hadoop@rachel-virtual-machine:/usr/local/spark/bin$ ./pyspark
./pyspark: 行 45: python: 未找到命令
Python 3.6.8 (default, Jan 14 2019, 11:02:34)
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] on linux
Type "help", "copyright", "credits" or "license" for more information.
2019-08-28 15:27:12 WARN Utils:66 - Your hostname, rachel-virtual-machine resolves to a loopback address: 127.0.1.1; using 192.168.80.128 instead (on interface ens33)
2019-08-28 15:27:12 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2019-08-28 15:27:23 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_ / _ / _ `/ __/ '_/
/__ / .__/_,_/_/ /_/_ version 2.3.3
/_/
Using Python version 3.6.8 (default, Jan 14 2019 11:02:34)
SparkSession available as 'spark'.
>>> from pyspark import SparkContext
>>> sc = SparkContext( 'local', 'test')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/spark/python/pyspark/context.py", line 129, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/usr/local/spark/python/pyspark/context.py", line 328, in _ensure_initialized
callsite.function, callsite.file, callsite.linenum))
ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=PySparkShell, master=local[*]) created by <module> at /usr/local/spark/python/pyspark/shell.py:59
出现这个错误是因为之前已经启动了SparkContext,所以需要先关闭spark,然后再启动。
>>> sc.stop()
>>> sc=SparkContext("local","test")
>>> logFile = "file:///usr/local/spark/README.md"
>>> logData = sc.textFile(logFile, 2).cache()
>>> numAs = logData.filter(lambda line: 'a' in line).count()
>>> numBs = logData.filter(lambda line: 'b' in line).count()
>>> print('Lines with a: %s, Lines with b: %s' % (numAs, numBs))
Lines with a: 61, Lines with b: 30